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Abstract. We prove analogues for hypergraphs of Szemeredi's regularity lemma and the 
associated counting lemma for graphs. As an application, we give the first combinatorial 
proof of the multidimensional Szemeredi theorem of Furstenberg and Katznelson, and the 
first proof that provides an explicit bound. Similar results with the same consequences 



O ■ have been obtained independently by Nagle, Rddl, Schacht and Skokan. 

^ ' Szemeredi's theorem states that, for every real number 6 > and every positive integer 

k, there exists a positive integer such that every subset A of the set {1,2, . . . , A^} of 
Q . size at least 5N contains an arithmetic progression of length k. There are now three 
substantially different proofs of the theorem, Szemeredi's original combinatorial argument 
[Szl], an ergo die-theory proof due to Furstenberg (see for example [FKO]) and a proof by 
the author using Fourier analysis [Gl]. Interestingly, there has for some years been a highly 
promising programme for yet another proof of the theorem, pioneered by Vojta Rodl (see 
^ , for example [R]), developing an argument of Ruzsa and Szemeredi [RS] that proves the 
^ ■ result for progressions of length three. Let us briefly sketch their argument. 
O ■ The first step is the famous regularity lemma of Szemeredi [Sz2]. If G is a graph 

and A and B are sets of vertices in V, then let e{A, B) stand for the number of pairs 
^ ■ {x,y) E A X B such that xy is an edge of G. Then the density d{A, B) of the pair (A, B) 

O ■ is e{A, B)/\A\\B\. The pah is e-regular if \d{A\ B') - d{A, B) \ ^ e for all subsets A' CI A 
and B' d B such that \A'\ ^ e\A\ and \B'\ ^ e|-B|. The basic idea is that a pair is regular 
with density d if it resembles a random graph with edge-probability d. Very roughly, the 
regularity lemma asserts that every graph can be decomposed into a few pieces, almost all 
of which are random-like. The precise statement is as follows. 

Theorem 1.1. Let e > 0. Then there exists a positive integer Kq such that, given any 
graph G, the vertices can be partitioned into K < Kq sets Vi, with sizes differing by at 
most 1, such that all but at most eK^ of the pairs (Vi^Vj) are e-regular. 

A partition is called e-regular if it satisfies the conclusion of Theorem 1.1. (Note that we 
allow i to equal j in the definition of a regular pair, though if K is large then this does 
not make too much difference.) The regularity lemma is particularly useful in conjunction 
with a further result, known as the counting lemma. To state it, it is very convenient 
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to use the notion of a graph homomorphism. If G and H are graphs, then a function 
(j) : V{H) — > V{G) is called a homomorphism if (j){x)(j){y) is an edge of G whenever xy is 
an edge of H. It is an isomorphic embedding if in addition (f){x)(f){y) is not an edge of G 
whenever xy is not an edge of H. 

Theorem 1.2. For every a > and every k there exists e > with the foUowing property. 
Let Vi, . . . , Vfe be sets of vertices in a graph G, and suppose that for each pair {i,j) the 
pair [Vi, Vj) is e-regular with density dij. Let H be a graph with vertex set [xi, . . . , Xk), 
let Vi e Vi be chosen independently and uniformly at random, and let 4> be the map that 
takes Xi to Vi for each i. Then the probability that (f) is an isomorphic embedding differs 
from Yl^.^.^H dij UxiXj^HC^ - dij) by at most a. 

Roughly, this result tells us that the A;-partite graph induced by the sets Vi, . . . , Vfc contains 
the right number of labelled induced copies of the graph H. Let us briefly see why this 
result is true when H is a triangle. Suppose that U, V, W are three sets of vertices and the 
pairs (t/, V), {V, W) and (VF, U) are e-regular with densities V and 9 respectively. Then 
a typical vertex of U has about C\^\ neighbours in V and 9\W\ neighbours in W. By the 
regularity of the pair (V^W), these two neighbourhoods span about edges 
in G, creating that many triangles. Summing over all vertices of U we obtain the result. 

The next step in the chain of reasoning is the following innocent-looking statement 
about graphs with few triangles. Some of the details of the proof will be sketched rather 
than given in full. 

Theorem 1.3. For every constant a > there exists a constant c > with the following 
property. If G is any graph with n vertices that contains at most cn^ triangles, then it is 
possible to remove at most an? edges from G to make it triangle-free. 

Proof. This theorem is a simple consequence of the regularity lemma. Indeed, let e = 
e(a) > be sufficiently small and let Vi, . . . , Vk be an e-regular partition of the vertices 
of G. If there are fewer than a|yi||yj|/100 edges between Vi and Vj, then remove all 
those edges, and also remove all edges from Vi to Vj if {Vi, Vj) is not an e-regular pair. 
Since the partition is e-regular, we have removed fewer than an^ edges, and the resulting 
graph must either be triangle-free or contain several triangles. To see why this is, suppose 
that {x,y,z) is a triangle in G (after the edges have been removed), and suppose that 
{x,y,z) e Vi xVj X Vfe. Then by our construction the pair {Vi,Vj) must be regular and 
must span many edges (because we did not remove the edge (x, y)) and similarly for the 
pairs {Vj,Vk) and (Fj,T4). But then, by the counting lemma for triangles, the sets Vi, Vj 
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and Vfe span at least a^|V^||V}||Vfe|/10^ triangles. Each Vi has cardinality at least n/2K, 
where K depends on e only (which itself depends on a only). This proves that the result 
is true provided that c ^ a^/2^10^K^. □ 

Ruzsa and Szemeredi [RS] observed that Theorem 1.3 implies Szemeredi's theorem 
for progressions of length 3. More recently, Solymosi noticed [Sol, 2] that it also implied 
the following two-dimensional generalization. (Actually, neither of these statements is 
quite accurate. There are several closely related graph-theoretic results that have these 
consequences and can be proved using the regularity lemma, of which Theorem 1.3 is one. 
Ruzsa and Szemeredi and Solymosi did not use Theorem 1.3 itself but their arguments are 
not importantly different.) 

Corollary 1.4. For every S > there exists N such that every subset A C [A^]^ of size at 
least SN^ contains a triple of the form {x, y), {x -\- d, y), (x, y + d) with d > 0. 

Proof. First, note that an easy argument allows us to replace ^ by a set B that is 
symmetric about some point. Briefly, if the point {x, y) is chosen at random then the 
intersection of A with (x, y) — A has expected size c5^N^ for some absolute constant c > 0, 
lives inside the grid [— A, A]^, and has the property that B = {x,y) — B. So 5 is still 
reasonably dense, and if it contains a subset K then it also contains a translate of —K. 
So we shall not worry about the condition d > 0. (I am grateful to Ben Green for bringing 
this trick to my attention. As it happens, the resulting improvement to the theorem is 
something of a side issue, since the positivity of d does not tend to be used in applications. 
See for instance Corollary 1.5 below. See also the remark at the beginning of the proof of 
Theorem 10.3.) 

Without loss of generality, the original set A is symmetric in this sense. Let X be the 
set of all vertical lines through [A]^, that is, subsets of the form {{x^y) : x = u} for some 
u e [A]. Similarly, let Y be the set of all horizontal lines. Define a third set, Z, of diagonal 
lines, that is, lines of constant x + y. These sets form the vertex sets of a tripartite graph, 
where a line in one set is joined to a line in another if and only if their intersection belongs 
to A. For example, the line x = u is joined to the line y = v if and only if {u, v) & A and 
the line x = u is joined to the line x + y = w if and only if [u, w — u) & A. 

Suppose that the resulting graph G contains a triangle of lines x = u, y = v, x+y = w. 
Then the points (u, v), {u, w — u) and (w — v, v) all lie in A. Setting d = w — u — v,we can 
rewrite them as {u,v), {u,v + d), {u + d,v), which shows that we are done unless d = 0. 
When d = 0, we have u + v = w, which corresponds to the degenerate case when the 
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vertices of the triangle in G are three hnes that intersect in a single point. Clearly, this 
can happen in at most |^| = o{N^) ways. 

Therefore, if A contains no configuration of the desired kind, then the hypothesis of 
Theorem 1.3 holds, and we can remove o(Ar^) edges from G to make it triangle-free. But 
this is a contradiction, because there are at least SN"^ degenerate triangles and they are 
edge-disjoint. □ 

An easy consequence of Corollary 1.4 is the case /c = 3 of Szemeredi's theorem, which 
was first proved by Roth [R] using Fourier analysis. 

Corollary 1.5. For every 5 > there exists N such that every subset A of {1, 2, ... , N} 

of size at least 5N contains an arithmetic progression of length 3. 

Proof. Define B C [A^]^ to be the set of all (x, y) such that x-\-2y G A. It is straightforward 
to show that B has density at least ry > for some r] that depends on d only. Applying 
Corollary 1.2 to B we obtain inside it three points {x,y), {x + (i, y) and {x^y + d). Then 
the three numbers x + 2y, x + d + 2y and x + 2{y + d) belong to A and form an arithmetic 
progression. □ 

And now the programme for proving Szemeredi's theorem in general starts to become 
clear. Suppose, for example, that one would like to prove it for progressions of length 4. 
After a little thought, one sees that the direction in which one should generalize Theorem 
1.3 is the one that takes graphs to 3-uniform hypergraphs, or 3-graphs, for short, which are 
set systems consisting of subsets of size 3 of a set X (just as a graph consists of pairs). If 
if is a 3-uniform hypergraph, then a simplex in if is a set of four vertices x,y,z and w 
of H (that is, elements of the set X) such that the four triples xyz, xyw, xzw and yzw 
all belong to H. The following theorem of Frankl and Rodl is a direct generalization of 
Theorem 1.3, but its proof is much harder. 

Theorem 1.6. For every constant a > there exists a constant c > with the following 
property. If H is any S-uniform hypergraph with n vertices that contains at most cn^ 
simplices, then it is possible to remove at most an^ edges from H to make it simplex-free. 

As observed by Solymosi, it is straightforward to generalize the proof of Theorem 1.4 and 
show that Theorem 1.6 has the following consequence. 

Theorem 1.7. For every S > there exists N such that every subset A C [N]^ of size at 
least SN^ contains a quadruple of points of the form 

{(x, y, z),(x + d, y, z), (x, y+d,z), (x, y,z + d)} 
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with d> 0. 



Similarly, Szemeredi's theorem for progressions of length four is an easy consequence of 
Theorem 1.7 (and once again one does not need the positivity of d). 

It may look as though this section contains enough hints to enable any sufficiently 
diligent mathematician to complete a proof of the entire theorem. Indeed, here is a sketch 
for the 3-uniform case. First, one proves the appropriate 3-graph analogue of Szemeredi's 
regularity lemma. Then, given a hypergraph H, one applies this lemma. Next, one removes 
all sparse triples and all triples that fail to be regular. If the resulting hypergraph contains 
a simplex, then any three of the four sets in which its vertices lie must form a dense regular 
triple, and therefore (by regularity) the hypergraph contains many simplices, contradicting 
the original assumption. 

The trouble with the above paragraph is that it leaves unspecified what it means for 
a triple to be regular. It turns out to be surprisingly hard to come up with an appropriate 
definition, where "appropriate" means that it must satisfy two conditions. First, it should 
be weak enough for a regularity lemma to hold: that is, one should always be able to 
divide a hypergraph up into regular pieces. Second, it should be strong enough to yield 
the conclusion that four sets of vertices, any three of which form a dense regular triple, 
should span many simplices. The definition that Frankl and Rodl used for this purpose 
is complicated and it proved very hard to generalize. In [G2] we gave a different proof 
which is in some ways more natural. The purpose of this paper is to generalize the results 
of [G2] from 3-uniform hypergraphs to fc-uniform hypergraphs for arbitrary k, thereby 
proving the full multidimensional version of Szemeredi's theorem (Theorem 10.3 below), 
which was first proved by Furstenberg and Katznelson [FK]. This is the first proof of the 
multidimensional Szemeredi theorem that is not based on Furstenberg's ergodic-theoretic 
approach, and also the first proof that gives an explicit bound. The bound, however, is 
very weak — it gives an Ackermann-type dependence on the initial parameters. 

Although this paper is self-contained, we recommend reading [G2] first. The case k = 3 
contains nearly all the essential ideas, and they are easier to understand when definitions 
and proofs can be given directly. Here, because we are dealing with a general k, many of 
the definitions have to be presented inductively. The resulting proofs can be neater, but 
they may appear less motivated if one has not examined smaller special cases. For this 
reason, we do indeed discuss a special case in the next section, but not in as complete a 
way as can be found in [02] . Furthermore, the bulk of [G2] consists of background material 
and general discussion (such as, for example, a complete proof of the regularity lemma for 
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graphs and a detailed explanation of how the ideas relate to those of the analytic approach 
to Szemeredi's theorem in [Gl]). Rather than repeat all that motivating material, we refer 
the reader to that paper for it. 

The main results of this paper have been obtained independently by Nagle, Rodl, 
Schacht and Skokan [NRS,RS]. They too prove hypergraph generalizations of the regularity 
and counting lemmas that imply Theorem 10.3 and Szemeredi's theorem. However, they 
formulate their generalizations differently and there are substantial differences between 
their proof and ours. Broadly speaking, they take the proof of Prankl and Rodl as their 
starting point, whereas we start with the arguments of [G2]. This point is discussed in 
more detail in the introduction to §6 of this paper, and also at the end of [G2]. 

§2. A discussion of a small example. 

The hardest part of this paper will be the proof of a counting lemma, which asserts 
that, under certain conditions, a certain type of structure "behaves randomly" in the sense 
that it contains roughly the expected number (asymptotically speaking) of configurations 
of any fixed size. In order even to state the lemma, we shall have to develop quite a lot 
of terminology, and the proof will involve a rather convoluted inductive argument with 
a somewhat strange inductive hypothesis. The purpose of this section is to give some of 
the argument in a special case. The example we have chosen is small enough that we can 
discuss it without the help of the terminology we use later: we hope that as a result the 
terminology will be much easier to remember and understand (since it can be related to 
the concrete example). Similarly, it should be much clearer why the inductive argument 
takes the form it does. From a logical point of view this section is not needed: the reader 
who likes to think formally and abstractly can skip it and move to the next section^ . 

To put all this slightly differently, the argument is of the following kind: there are some 
simple techniques that can be used quite straightforwardly to prove the counting lemma in 
any particular case. However, as the case gets larger, the expressions that appear become 
quite long (as will already be apparent in the example we are about to discuss), even if 
the method for dealing with them is straightforward. In order to discuss the general case, 
one is forced to describe in general terms what it is one is doing, rather than just going 
ahead and doing it, and for that it is essential to devise a suitably compact notation, as 

^ This section was not part of the original submitted draft. One of the referees suggested 
treating a small case first, and when I reread the paper after a longish interval I could see 
just how much easier it would be to understand if I followed the suggestion 
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well as an inductive hypothesis that is sufficiently general to cover all intermediate stages 
in the calculation. 

Now we are ready to turn to the example itself. Let X, Y, Z and T be four finite 
sets. We shall adopt the convention that variables that use a lower-case letter of the alpha- 
bet range over the set denoted by the corresponding upper-case letter. So, for example, 
x' would range over X. Similarly, if we refer to "the function v{y,z,t)," it should be 
understood that v is a function defined onY x Z xT. 

For this example, we shall look at three functions, f{x,y,z), u{x,y,t) and v{y,z,t). 
(The slightly odd choices of letters are deliberate: / plays a different role from the other 
functions and t plays a different role from the other variables.) We shall also assume 
that they are supported in a quadripartite graph G, with vertex sets X, Y, Z and T, 
in the sense that f{x,y,z) is non-zero only if xy, yz and xz are all edges of G, and 
similarly for the other three functions. As usual, we shall feel free to identify G with its 
own characteristic function, so another way of stating our assumption is that f{x, y, z) = 
fix, y, z)G{x, y)G{y, z)G{x, z). 

We will need one useful piece of shorthand as the proof proceeds. We shall write 
fx,x'{.y^z) for f{x,y,z)f{x',y,z), and similarly for the other functions (including G) and 
variables. We shall even iterate this, so that fx,x' ,y,y'{z) means 

fix, y, z)fix', y, z)fix, y', z)fix', y', z). 

Of particular importance to us will be the quantity Oct(/) = ^x,x' ,y.y' ,z,z' fx,x' ,y,y' ,z,z' , 
which is a count of octahedra, each one weighted by the product of the values that / takes 
on its eight faces. 

Now let us try to obtain an upper bound for the quantity 

^x,y,z,tfix, y, z)uix, y, t)viy, z, t). 

Our eventual aim will be to show that this is small if Oct(/) is small and the six parts 
of G are sufficiently quasirandom. However, an important technical idea of the proof, 
which simplifies it considerably, is to avoid using the quasirandomness of G for as long as 
possible. Instead, we make no assumptions about G (though we imagine it as fairly sparse 
and very quasirandom), and try to obtain an upper bound for our expression in terms of 
fx,x',y,y',z,z' ci^d G. Only later do we use the fact that we can handle quasirandom graphs. 
In the more general situation, something similar occurs: now G becomes a hypergraph, 
but in a certain sense it is less complex than the original hypergraph, which means that its 
good behaviour can be assumed as the complicated inductive hypothesis alluded to earlier. 
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As with many proofs in arithmetic combinatorics, the upper bound we are looking for 
is obtained by repeated use of the Cauchy-Schwarz inequahty, together with even more 
elementary tricks such as interchanging the order of expectation, expanding out the square 
of an expectation, or using the inequality Kxf{x)g{x) < ||/||i||^||oo- The one thing that 
makes the argument slightly (but only slightly) harder than several other arguments of 
this type is that it is essential to use the Cauchy-Schwarz inequality efficiently, and easy 
not to do so if one is careless. In many arguments it is enough to use the inequality 
(Exfix))"^ < Ej;/(a;)^, but for us this will usually be inefficient because it will usually 
be possible to identify a small set of x outside which f{x) is zero. Letting A be the 
characteristic function of that set, we can write / = Af, and we then have the stronger 
inequality (E,/(x))2 < E,A{x)E,f{x)\ 

Here, then, is the first part of the calculation that gives us the desired upper bound. 
We need one further assumption: that the functions /, u and v take values in the interval 



The inequality here is Cauchy-Schwarz, and we have used the fact that v{y, z, t) is non-zero 
only if G{y, z)G{y, t)G{z, t) = 1. For the same reason, the second bracket is at most 



The first equality here follows from the fact that G{y, z) and t) are 1 whenever 
f{x,y,z) and u{x,y,t) are non-zero. The inequality is a simple case of Cauchy-Schwarz, 
applied twice. 

Simple manipulations and arguments of the above kind are what we shall use in 
general, but more important than these is the relationship between the first and last 






,z,t (Ea;/(x, y, z)u{x, y, t)G{y, z)G{y, t)G{z, t)^ ) 




^ Ej;,^/ [Ey^:,^tfx,x'{y, z)ux,x>{y,t)G{z,t) 
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expressions. We would like it if the last one was similar to the first, but in some sense 
simpler, so that we could generalize both statements to one that can be proved inductively. 

Certain similarities are immediately clear, as is the fact that the last expression, if we 
fix X and x' rather than taking the first expectation, involves functions of two variables 
rather than three, and a fourth power instead of an eighth power. The only small differ- 
ence is that we now have the function G appearing rather than some arbitrary function 
supported in G. This we shall have to incorporate into our inductive hypothesis somehow. 

However, in this small case, we can simply try to repeat the argument, so let us 
continue with the calculation: 



n4 

^y,z,tfx,x' {y, z)ux,x' {y, t)G{z, t)j 

= (Ez^t^yfa:,x'{y,z)u 

x,x' 

iy,t)Giz,t)y 
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= [Ez,t'^yfx,x' (y, z)Ux,x' {y, t)Gx,x' {z)Gx,x' {t)G{z, t) ^ 

< {^,,tGx,x'{z)Gx,x'{t)G{z,t)) {¥.,,t{^yfx,x'{y 

, Z^Ux^x' 

Here, we used the fact that fx,x'{y,z) is non-zero only if G{x,z) and G{x\z) are both 
equal to 1, with a similar statement for Ux^x'iyjt). We then applied the Cauchy-Schwarz 
inequality together with the fact that G squares to itself. Given that G could be quite 
sparse, it was important here that we exploited its sparseness to the full: with a lazier 
use of the Cauchy-Schwarz inequality we would not have obtained the factor in the first 
bracket, which will in general be small and not something we can afford to forget about. 

Now let us continue to manipulate the second bracket in the standard way: expanding 
the inner square, rearranging, and applying Cauchy-Schwarz. This time, in order not to 
throw away any sparseness information, we will bear in mind that the expectation over y 
and y' below is zero unless all of G{x,y), G{x',y), G{x,y') and G{x',y') are equal to 1. 

(e^,* (^yfx,x' {y, z)ux,x' {y, t)G{z, t)^ j 

= (Ey^yfGx,x',y,y'^z,tfx,x',y,y'iz)Ux,x',y,y'it)G{z, tfj 

< (Ey^y'Gx,x',y,y'^(^^y,y'(^^z,tfx,x',y,y'iz)Ux,x',y,y'it)G{z,t)^ ^ . 

We have now got down to functions of one variable, apart from the term G{z,t). 
Instead of worrying about this, let us continue the process. 

{^z,tfx ,x',y,y' {z)Ux^x' ,y,y' 

{t)Giz,t)) 

= (^t^zfx,x',y,y'{z)Ux,x',y,y'{t)G{z,t)^ . 
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Now we shall apply Cauchy-Schwarz again, and again we must be careful to use the 
full strength of the inequality by taking account that for most values of t the expectation 
over z is zero. We can do this by noting that 

Ux,x' ,y,y'{t) = Ux,x',y,y'{t)Gx,x'{i)Gy^y'{t) 

SO the last expression above is at most 

(EtGx,x'{t)Gy,y>{t)^(Et(EJx ,x',y,y'{z)Ux,x',y,y' {t)G{z,t)) ). 
The second term in this product is at most 

Et (^zfx,x',y,y' {z)Gx,x' {t)Gy,y' {t)G{z, t)^ , 

which equals 

^t^z,z'fx,x',y,y',z,z'Gx,x'{t)Gy^y'{t)Gz,z'{t)- 

Let us put all this together and see what the upper bound is that we have obtained. 
It works out to be 

(Ey^zMv^ z)G{y. t)G{z, t)) E,,,, {^z,tG^,^> {z)G^^^, {t)G{z, t)) {^y,y'G^^^, ^y,y>j 

^y,y' {^tGx,x'{t)Gy^y' {t))Ez,z'fx ,x',y,y',z,z'EtGx,x' {t)Gy,y' {t)Gz,z' (t)- 

Here we have been somewhat sloppy with our notation: a more correct way of writing 
the above expression would be to have different names for the variables in different ex- 
pectations. If one does that and then expands out the powers of the brackets, then one 
obtains an expression with several further variables besides x, x', y, y', z, z' and t. One 
takes the average, over all these variables, of an expression that includes fx,x',y,y',z,z' and 
many terms involving the function G applied to various pairs of the variables. Recall that 
this is what we were trying to do. 

We can interpret this complicated expression as follows. We allow the variables to 
represent the vertices of a quadripartite graph F, with two variables q and r joined by an 
edge if G{q,r) appears in the product. For example, the Gz,z'{t) that appears at the end 
of the expression is short for G{z,t)G{z' ,t), so it would tell us that zt and z't were edges 
of the graph (assuming that those particular variables had not had their names changed). 

When we assign values in X, Y, Z and T to the various variables, we are defining a 
quadripartite map from the vertex set of F to the set X UY U Z UT. And the product of 
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all the terms involving G is telling us whether a particular assignment to the variables of 
values in X, y, Z and T results in a graph homomorphism from F to G. 

Thus, the expression we obtain is an expectation over all such quadripartite maps of 
fx,x',y,y',z,z' multiplied by the characteristic function of the event "0 is a homomorphism." 

Notice that in this expression the function / appears eight times, as it does in the 
expression with which we started, since that contains a single / inside the bracket, which 
is raised to the eighth power. This is important, as we need our inequality to scale up in 
the right way. But equally important is that this scaling should occur correctly in G as 
well. We can think of G as put together out of six functions (one for each pair of vertex 
sets). Let us now reflect this in our notation, writing Gxy for the part of G that joins X 
to Y, and so on. If we want to make explicit the fact that /, u and w are zero except at 
triangles in G, then we can rewrite the first expression as 



This makes it clear that each part of G (such as Gxy) occurs eight times. In order to 
have a useful inequality we need the same to be true for the final expression that we are 
using to bound this one. As it is written at the moment, Gxt, Gyt and Gzt are used 
eight times each, but Gxy, Gyz and Gxz are used only four times each. However, there 
are once again some implicit appearances, hidden in our assumptions about when / can 
be non-zero. In particular, we can afford to multiply fx,x',y,y',z,z' by the product over all 
graph terms, such as Gyziv', z), that must equal 1 if fx,x',y,y',z,z' is non-zero. This gives 
us four extra occurrences of each of Gxy, Gyz and Gxz- 

We eventually want to show that if Oct(/) is small and all the functions such as Gxy 
are "sufficiently quasirandom" , then the expression with which we started is small. In 
order to see what we do next, let us abandon our current example, since it has become 
quite complicated, and instead look at a simpler example that has the same important 
features. In order to make this simpler example properly illustrative of the general case, 
it will help if we no longer assume that G uses all the vertices in X, Y, Z and T. Rather, 
we shall let P, Q, R and S be subsets of X, Y, Z and T, respectively, and G will be a 
graph that does not join any vertices outside these subsets. Then we shall consider how 
to approximate the quantity 




f{x, y, z)u{x, y, t)w{y, z, t)GxY{x, y)Gxz{x, z)Gxt{x, t) 




E, 



'x,y,z,t 



fix, y, z)G{x, t)G{y, t)G{z, t)P{x)Q{y)R{z)S{t) 
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by the quantity 



^x,y,z,tf{x, y, z)SxTG{y, t)G{z, t)P{x)Q{y)R{z)S{t), 

where 6xt is now the relative density of G inside the set P x 5 (rather than its absolute 
density inside X xT). The sets P, Q, R and S will themselves have densities, which we 
shall call 5x, dz and St- 

To begin with, we define a function g in the variables x and t by taking g{x, t) to be 
G{x^ t) — dxT when {x,t) & P x S and otherwise. The idea behind this definition is that 
we want to subtract from G{x, t) a function that is supported in P x S* and constant there, 
in such a way that the average becomes zero. Once we have done that, our task is then to 
show that 

^x,y,z,tf{x, y, z)g{x, t)G{y, t)G{z, t)P{x)Q{y)R{z)S{t) 
is small, provided that Oct{g) = ^x,x',t. ,t' 9x,x' ,t,t' 

small enough. 

The technique of proof is the same as we have already seen: we give the argument 
mainly to illustrate what we can afford to ignore and what we must be careful to take 
account of. Since ^ is a function of two variables, we shall start with the expression 

{^x,y,z,tf{x, y, z)g{x, t)G{y, t)G{z, t)^ 

= (Ey,,,tE,g{x, t)f{x, y, z)G{y, t)G{z, t)^ 

2 2 2 

^(Ey,,,tG{y,z)G{y,t)G{z,t)') (Ey,,,t(EMx,t)f{x,y,z)G{y,t)G{z,t)) ) . 

Now, we shall eventually be assuming that Oct(^) is significantly smaller than the densities 

of any of the parts of G, but not necessarily smaller than the densities of the sets P, Q, R 
and S. The effect on our calculations is that we can afford to throw away the G-densities 
(by replacing them by 1) but must be careful to keep account of the densities of vertex 
sets. Thus, we may replace the expectation Ky^z,tG{y, z)G{y,t)G{z,t) in the first bracket 
by the larger expectation Ky,z,tQ{y)R{z)S{t). (This is of course easily seen to be dySz^T, 
but in more general situations it will not necessarily be easy to calculate.) 
As for the second part of the product, it equals 

[My,z,tG{y^ t)G{z, t) (E,g{x, t)f{x, y, z))')' , 
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which we can afford to bound above by 

(Ey,,,tQ{y)Riz)S{t) (E,g{x, t)f{x, y, 2;)) ) 

z)Q{y)R{z)S{t)) 

= (^^x,x''^y,z,tgx,x'{t)P{x)P{x')fx^a:'{y, ^)) 

^ (Ea,,^,P{x)P{x')^ (e^,^. (Ey,,,tgx,x' {t)fx,x' {y, -s)) ') • 

Now we concentrate our efforts on the second braclcet. 

{^y,z,t9x,x'{t)fx,x'{y,z)^ 

= (^^y,zQ{y)R{z)fx,x'{y, z)E.tga:,x'{t)J 

^ (Ey,,Q{y)R{z)f,,,iy,zf^(Ey,,Q{y)R{z)(Etg,,,,{t)yy 

Since / is a function of three variables, we are even more prepared to bound fx,x'{yizY 
above by 1 than we were with G. That is, we can bound the first bracket above by 
^y,zP{x)P{x')Q{y)R{z). The second equals ^y,z,t,t'Q{y)R{z)gx,x',t,t' ■ Since the second 
is automatically zero if P{x)P{x') is zero, we can even afford to bound the first one by 
¥.y,,Q{y)R{z). 

Putting all this together, we find that 

{^x,y,z,tf{x, y, z)g{x, t)G{y, t)G{z, t)^ 

is at most 

(Ey,,,tQ{y)R{z)S{t)y (e,,,,P{x)P{x')') 

(e,,,, (Ey,,Q{y)Riz)'^(Ey,,,t,t,Q{y)R{z)ga,,,,,t,t'))- 

It is not hard to check that this equals Oct((7). This quantity will count as a 

small error if Oct(gf) is small compared with since then our upper bound is small 

compared with its trivial maximum of Sj^SyS'^S^ (which, in the general case, is rather less 
trivial) . 

An important point to note about the above argument is that even though the ex- 
pression we started with included a function of three variables, it did not cause us any 
difficulty because we were eventually able to bound it above in a simple way. This explains 
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why an inductive argument is possible: when we are deahng with functions of k variables 
xi, . . . , Xk, we do not have any trouble from functions of more variables, provided that at 
least one of a;i, . . . , a;^ is not included in them. 

Of course, once we have replaced G{x, t) by 5xTP{x)S{t) we can run similar arguments 
to replace G{y,t) and G{z,t) by 5YTQ{y)S{t) and 5zTR{z)S{t), respectively. Thus, there 
will be three nested inductions going on at once: the number of variables k in the function 
under consideration, the number of functions of k variables still left to consider, and the 
number of steps taken in the process of replacing a function / by a function of the form 
fx. x' xu x' ■ Section 4 is concerned with the last of these, and the first two are dealt 
with in Section 5. 

§3. Some basic definitions. 

The need for a more compact notation should by now be clear. In this section, we 
shall provide such a notation and also explain the terminology that will be needed to state 
our main results. 

3.1. Hypergraphs and chains. 

An r -partite hypergraph is a sequence Xi, . . . , Xr of disjoint sets, together with a 
collection Ti of subsets A of Xi U . . . U X^ with the property that |A fl X^l ^ 1 for every i. 
The sets Xi are called vertex sets and their elements are vertices. The elements of Ti. are 
called edges, or sometimes hyperedges if there is a danger of confusing them with edges in 
the graph-theoretic sense. A hypergraph is k-uniform if all its edges have size k. (Thus, a 
2-uniform hypergraph is a graph.) 

An r-partite hypergraph Ti is called an r-partite chain if it has the additional property 
that B is an edge of H whenever A is an edge of H and B C A. Thus, an r-partite chain 
is a particular kind of combinatorial simplicial complex, or down-set. Our use of the word 
"chain" is non-standard (in particular, it has nothing to do with the notion of a chain 
complex in algebraic topology). We use it because it is quicker to write than "simplicial 
complex" . 

If the largest size of any edge of H is k, then we shall sometimes say that 7:^ is a 
k- chain. 

3.2. Homomorphisms and r-partite functions. 

Let El,. . .,Er and Xi, . . .,Xr be two sequences of disjoint finite sets. If is a map 
from EiU . . .Li Ej. to XiU . . .\J Xr such that (j){Ei) C X^ for every i, we shall say that 
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is an r-partite function. 

Let J be an r-partite chain with vertex sets Ei, . . .,Er and let H be an r-partite 
chain with vertex sets Xi, . . . ,Xr. Let be an r-partite function from the vertices of 
to the vertices of H. We shall say that is a homomorphism from J to H ii (f){A) e H 
whenever A e J. We shall write Hom( J", H) for the set of all homomorphisms from J 

to n. 

3.3. A-functions and J' -functions. 

Let $ be the set of all r-partite maps from U . . . U E'^ to Xi U . . . UX^.. We shall also 
consider some special classes of functions defined on If ^ is a subset of U . . .UE'^ such 
that |^n£?j| ^ 1 for every i, then a function / : $ — > [—1, 1] will be called an A-function if 
the value of /(0) depends only on the image (f){A). If J is an r-partite chain with vertex 
sets El, . . .,Er, then a J- function is a function / : $ — > [— 1, 1] that can be written as a 
product / = J- f^^i where each f"^ is an ^-function. 

The definition of ^-functions and J'-functions is introduced in order to deal with 
situations where we have a function of several variables that can be written as a product 
of other functions each of which depends on only some of those variables. We met various 
functions of this type in the previous section. Let us clarify the definition with another 
small example. Suppose that we have three sets Xi, X2 and X^ and a function / : 
Xf X X2 X X3 ^ [-1, 1] of the form 

f{xi,x[,X2,Xs) = fl{xi,X2)f2{xi,X3)f3{x[,X2)f4{x[,Xs) . 

Let El = {1, 1'}, E2 = {2} and £"3 = {3}. There is an obvious one-to-one correspondence 
between quadruples (xi, x'l, X2, ^3) and tripartite maps from EiU E2U E^: given such a 
sequence one associates with it the map 4) that takes 1 to xi, 1' to x'l, 2 to X2 and 3 to x^. 
Therefore, we can if we wish change to a more opaque notation and write 

f{<t>) = fM)f2{(t>)fMfA{<k) . 

Now f2{<f) — /2(</'(l), 0(3)) = /2(0({1, 3})), so /2 is a {1, 3}-function. Similar remarks 
can be made about /i, f^ and /4. It follows that / is a jT-function if we take J to be the 
chain consisting of the sets {1, 2}, {1, 3}, {!', 2} and {!', 3} and all their subsets. The fact 
that the subsets are not mentioned in the formula does not matter, since if C is one of 
these subsets we can take the function that is identically 1 as our C-function. 

An important and more general example is the following. As above, let J be an 
r-partite chain with vertex sets Ei, . . . ,Er and let Ti be an r-partite chain with vertex 
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sets Xi, . . .,Xr. For each in $ and each A e J let H^{(j)) equal 1 if (j){A) e H and 
otherwise. Let 7Y(0) = Y{a^jH^{^)- Then equals 1 if e Hom( J, Ti) and 

otherwise. In other words, the characteristic function of Hom( J", Ti) is a J'-function. We 
stress that 'H{(j)) depends on J\ however, it is convenient to suppress this dependence in 
the notation. Our counting lemma will count homomorphisms from small chains to 
large quasirandom chains Ti, so we can regard our main aim as being to estimate the sum 
(or equivalently, expectation) of 7^(0) over all e However, in order to do so we need 
to consider more general J'-functions. 

The iJ-functions we consider will be supported in a chain V, in the following sense. 
Let us say that an A- function is supported in Ti if f^{(p) is zero whenever (j){A) fails 
to be an edge of H. Equivalently, is supported in if = f^H^^^ where H"^ is as 
defined above. We shall say that / is a J-function on Ti if it can be written as a product 
HyieJ' '^here each /"^ is an A-function supported in Ti. If / is a JT- function on 7i, then 
/((/)) = whenever (f) does not belong to Hom(j7', That is, /((/>) = f {(/))T-C{(/)) . Notice 
that the product of any J' function with the function H. will be a ^/-function on Ti. 

This is another definition that came up in the previous section. In that case, the three 
functions in the product /(x, y, z)u{x, y, t)v{y, z, t) considered in the previous section were 
all supported in the chain Ti. that consisted of the triangles in the graph G, the edges of 
G, and the vertices of G. If we let be the chain consisting of the sets {x,y,z}, {x,y,t}, 
{y, z, t} and all their subsets (where we are regarding the letters as names of variables 
rather than elements of X, Y, Z and T), then this product is a J'-function on H. 

3.4. The index of a set, and relative density in a chain. 

Let Ti be an r-partite chain with vertex sets Xi, . . . , X^. Given a set F E Ti, define 
its index i{F) to be the set of all i such that F HXi is non-empty. (Recall that F ClXi is a 
singleton for each such i.) For any set A in any r-partite chain, let H{A) be the collection 
of all sets E E Ti oi index equal to that of A. If A has cardinality k, then let H<t:{A) be 
the collection of all sets D of index i{A) such that C E Ti whenever C G D and C has 
cardinality k — 1. (Since ?i is a chain, it follows from this that all proper subsets of D 
belong to Ti. Note that we do not require D to belong to Ti.) Clearly H{A) C H^{A). 
The relative density of H{A) in H is defined to be \H{A)\/\H^{A)\. We will denote it by 
Sa. 

Once again, the example in the last section illustrates the importance of H^{A). Let 
us rename the vertex sets X, Y, Z and T as Xi,X2,X3 and X4. If is a 3-chain that 
consists of the edges and vertices of the graph G, and some collection of triangles of G, 
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and if ^ = {1, 2, 3}, say, then H^{A) consists of all triangles in G with one vertex in each 
of Xi, X2 and Xs, while H{A) consists of all 3-edges of H with one vertex in each of Xi, 
X2 and X3. Thus, 5a measures the proportion of the triangles in G that are edges in H. 

It is useful to interpret the relative density 5a probabilistically: it is the conditional 
probability that a randomly chosen set D C Xi U . . . UX^ of index i{A) belongs to H (and 
hence to H{A)), given that all its proper subsets belong to H. 

Notational remark. It may help the reader to remember the definitions in this section 
if we explicitly point out that most of the time we are adopting the following conventions. 
The symbols J' and JC are used for chains of fixed size that are embedded into a chain TC of 
size tending to infinity. From these we sometimes form other chains: for instance, J7i will 
be a chain of fixed size derived from a chain J', and 7i{x) will be a chain of size tending to 
infinity that depends on a point x. The letter H will tend to be reserved for set systems 
connected with H where the sets all have the same index. The same goes for functions 
derived from H. For example, we write 7:^(0) because we use the full chain H to define the 
function, whereas we write H-^{(f)) because for that we just use sets of index i{A)^ which 
all have size |^|. Similarly, we write H^{A) because all sets in H^{A) have index i{A). 

3.5. Oct(/^) for an A-iunction 

We are building up to a definition of quasirandomness for 'H{A). An important in- 
gredient of the definition is a weighted count of combinatorial octahedra, which gener- 
alizes the definition introduced in the last section. If / is a function of three variables 
x, y and z that range over sets X, Y and Z, respectively, then we defined Oct(/) to be 
'^x,x' ,y,y' ,z,z' fx,x' ,y,y' ,z,z' ■ In fuU, this is the expectation over all x,x' & X , y,y' & Y and 
z,z' e Z of 

fix, y, z)f{x, y, z')f{x, y', z)f(x, y', z')f{x\ y, z)f{x', y, z')f{x', y\ z)f{x', y', z') . 

Similarly, if / is a function of k variables xi, . . . ,Xk, with each Xi taken from a set Xj, 
then 

Oct(/) = E,o,,ie^^ . ..E,o,,iex, n /(^i'' • • • '4'=) • 

In the spirit of the previous section, we can (and shall) also write this as Eo-/a, where a is 
shorthand for xi,Xi, . . . ,Xk,x'f,. 

To give a formal definition in more general situations it is convenient to use the 
language of ^-functions, though in fact we shall try to avoid this by assuming without loss 
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of generality that the set A we are talking about is the set {1, 2, . . . , A;}. Nevertheless, here 
is the definition. As before, let J and H be r-partite chains with vertex sets Ei, . . .,Er 
and Xi, . . . , let $ be the set of all r-partite maps from EiU .. .\J Er to XiU .. .Li Xr 
and let A e J. We can think of an ^-function as a function defined on the product of 
those Xi for which i e i{A). However, we can also think of it as a function /"^ defined on 
$ such that f'^{(j)) depends only on (j){A). To define Oct(/^) in these terms, we construct 
a set system B as follows. Let k be the cardinality of the set A. For each i e i{A) let Ui 
be a set of cardinality 2, let U be the union of the Ui (which we suppose to be disjoint) 
and let B consist of the 2^ sets B C U such that \B nUi\ — 1 for every i. Let fi, be the set 
of all fc-partite maps uj from Uiei(yi) ^iei{A) -^i (meaning that uj{Ui) C Xi for every 

iei{A)). 

We now want to use f^, which is defined on to define a S-function on O, for 
each B & B. There is only one natural way to do this. Given G O and B E B, we would 
like f^{u)) to depend on uj{B); we know that B and uj{B) have the same index as A; so 
wc choose some (j) E ^ such that 4>{A) = oj{B) and define f^{u)) to be f^{(p). This is 
well-defined, since if (f){A) = 4)'{A), then /^((/)) = /^(f), because is an A-function. 

We now define 

Oct(/^) = E^efi n /""H • 
BeB 

Let us see why this agrees with our earlier definition. There, for simplicity, we took A to 
be the set {1, 2, . . . , k}. Then for each i ^ k we let Ui = {x'-, x}}, and B consisted of all 
sets of the form B^ = {x\^ ^ . . . , x^^^}, with e = (ei, . . . , e^) G {0, l}'^. The set O was the 
set of all ways of choosing x^- and xj in Xi, for each i ^ k. (Again there is a deliberate 
ambiguity in our notation. When we say that Ui = {x^^xj} we are thinking of x^ and 
xj as symbols for variables, and when we choose elements of Xi with those names, we 
are thinking of this choice as a function from the set {x^,xl} of symbols to the set X^.) 
Given a; G O and B — B^ E B, we have to define /^'(w). In principle a function of u can 
depend on all the variables x^ and x], but f^^ is a i?e-function, and therefore depends 
just on the variables x^K Now $ can be thought of as the set of ways of choosing yi E Xi 
for each i ^ k. In other words, we regard A as the set of variables {yi, . . . ,yk} and (f) 
as a way of assigning values to these variables. Thus, to define /^"{lj) we choose such 
that (f){A) = u!{B^), which means that (f){yi) must equal (^{xl'') for each i. (Equivalently, 
thinking of yi and as the assigned values, it means merely that a;^' must equal yi.) But 
then /{(j)) = /(j/i, . . . = f{xl^, . . . ^x^''). And now it is clear that the two expressions 
for Oct(/) denote the same quantity. 
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3.6. Octahedral quasirandomness. 

We come now to the first of two definitions that are of great importance for this paper. 
Let 7i be a chain, let be an A-function, for some A that does not necessarily belong 
to 7i, and suppose that is supported in H^{A), in the sense that = whenever 

4>{A) ^ H^{A). Equivalently, suppose that whenever /^{(j)) 7^ we have (p{C) G H 
for every proper subset C C A. Loosely speaking, we shall say that / is octahedrally 
quasirandom relative to Ti if Oct(/'^) is significantly smaller than one might expect. 

To turn this idea into a precise definition, we need to decide what we expect. Let 
B be the set system defined in the previous subsection. If 5 e B, then f^{oj) is defined 
to be the value of for any with (f)(A) = u;{B). If f^(oj) ^ 0, then ^ so 

(j){A) e H^{A), by assumption, and hence oj{B) e H^{A). Therefore, a necessary condition 
for YiBeB f^i'^) ^® non-zero is that a^(-D) G H for every D that is a proper subset of 
some B e B. Let JC' be the chain consisting of all such sets. Thus, JC' consists of all subsets 
of t/i U . . . U t/fc that intersect each Ui in at most a singleton and do not intersect every t/j. 
Then, since |/-^(a;)| ^ 1 for every B and every a;, a trivial upper bound for Oct(/'*) is 

which we shall call Oct{H^{A)), since it counts the number of (labelled, possibly degener- 
ate) combinatorial /c-dimensional octahedra in H^{A). 

We could if we wanted declare Oct(/"^) to be small if it is small compared with 
Oct{H^{A)). Instead, however, since we shall be working exclusively with quasirandom 
chains, it turns out to be more convenient to work out how many octahedra we expect 
H(A) to have, given the various relative densities, and use that quantity for comparison. 
(It might seem more natural to use H^(A)j but for the particular functions that we shall 
need to consider, Oct(/'^) will tend to be controlled by the smaller quantity Oct{H (A)) . 
But in the end this is not too important because when we are looking at Oct(/'*) we think 
of the density 5a as "large".) 

Let us therefore write /C for the set of all subsets of sets in B {so IC = B Li IC'). It is 
helpful to recall the interpretation of relative densities as conditional probabilities. Suppose 
that we choose u; randomly from Q, and also that H behaves in a random way. Then the 
probability that H^{(jj) = 1 given that H^{ijj) = 1 for every C C D is the probability 
that (jij{D) e H given that oj{C) e H for every CCD, which is 5d- Because H behaves 
randomly, we expect all these conditional probabilities to be independent, so we expect 
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that E(^gn fl^g^ will be approximately YId^ic^d- Accordingly, we shall say that 

is rj-octahedrally quasirandom if 

Oct(/^) ^ r/ n '^^ • 

D£lC 

Since octahedral quasirandomness is the only form of quasirandomness that we use in this 
paper, we shall often omit the word "octahedrally" from this definition. 

It is not necessary to do so, but one can rewrite the right-hand side more explicitly. 
For each subset C C A, there are 2l^l sets D e K. with the same index as C. (We can 
think of these as |C [-dimensional faces of the octahedron with index i{C).) Therefore, 

D€K CCA 

The main use of the definition of quasirandomness for A- functions is to give us a precise 
way of saying what it means for a /c-partite /c-uniform hypergraph to "sit quasirandomly 
inside a /c-partite [k — l)-chain". Let A and Ti. be as above. The /c-uniform hypergraph 
we would like to discuss is H{A). Associated with this hypergraph is its "characteristic 
function" and its relative density 5a- The {k — l)-chain is the set of all edges of Ti. 
with index some proper subset of A. Define an A- function /"^ by setting f^{(j)) to equal 
H^{(j)) - 5a if HA) e H^A) and zero otherwise. An important fact about is that its 
average is zero. To see this, note that f^{4>) — H{(I){A)) — 5a when (j){A) E H^{A) and 
f^{(/)) — otherwise. Therefore, the average over all (j) such that (p{A) ^ H^{A) is trivially 
zero, while the average over all 4> such that (f){A) e (A) is zero because 5a is the relative 
density of if (A) in H^{A). 

We shall say that H{A) is rj- octahedrally quasirandom, or just rj- quasirandom, relative 
to H, if the function is ?7-quasirandom according to the definition given earlier. The 
counting lemma, which we shall prove in §5, will show that if Ti is an r-partite chain and 
all its different parts of the form H{A) are quasirandom in this sense, then H behaves like 
a random chain with the same relative densities. 

3.7. Quasirandom chains. 

We are now ready for the main definition in terms of which our counting and regularity 
lemmas will be stated. Roughly speaking, a chain H is quasirandom if H{A) is highly 
quasirandom relative to H. However, there is an important subtlety to the definition, 
which is that when we apply it we do so in situations where the relative densities 5a tend 
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to be very much smaller when the sets A are smaller, as we saw in the second example 
of the previous section. For this reason, we need to make much stronger quasirandomness 
assumptions about H{A) when A is small, and it is also very important which of these 
assumptions depend on which densities. The full details of the following definition are not 
too important - they are chosen to make the proof work - but the dependences certainly 
are. 

One other comment is that our definition depends on a chain J. This is useful for 
an inductive hypothesis later. Roughly, if H is quasirandom with respect to then 
embeds into H in the expected way. Thus, the bigger J is, the stronger the statement. 

Now let us turn to the precise definition. Suppose that and H are r-partite chains. 
For each A E J^, let the relative density of H{A) in Ti. be 6a and suppose that H{A) is 
relatively ry^-quasirandom. Define a sequence e^, e^-i, . . . , ei by taking — e and 

ek-j=2-^'-'\J\-'(ek-j+i n 

when j ^ 1. Let r}]^-j be defined by the formula 

rik-j = {im(ek-, n 

AeJ 

\A\^k-i 

for each j. Then Ti is (e, J ^ k)- quasirandom if, for every A E J oi size j ^ fc, we have the 
inequality tja ^ or in other words H{A) is r^j -quasirandom relative to H^{A). 

The parameter k is also there just for convenience in our eventual inductive argument. 
The counting lemma will imply that if is a random r-partite map from J to an (e, J , k)- 
quasirandom chain 7Y, and if all sets in J have size at most /c, then the probability that 
is a homomorphism differs from nAej" most e\J\ YYa^j ^a- 

§4. The main lemma from which all else follows. 

Before we tackle our main lemma it will help to prepare for it in advance with a small 
further discussion of terminology. Let Ti be an r-partite chain with vertex sets Xi, . . . , X^. 
Let t^r and let x\, . . . ,Xt be variables such that xi ranges over Xi when z ^ r and over 
some other Xj ii i > r. For each j let Ej be the set of i such that Xi ranges over Xj 
(so, in particular, i E Ei when i ^ r). 

Now let J be an r-partite chain with vertex sets Ei,. . .,Er. Suppose that the set 
{1,2,..., k} does not belong to J but that all its proper subsets do. 
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We shall write r for the sequence {xi, . . . ,Xt). Note that there is a one-to-one corre- 
spondence between such sequences and r-partite maps from EiL) . . .\J Er to Xi U . . . U X^, 
so we can also think of r as such a map. 

Our aim will be to find an upper bound for the modulus of a quantity of the form 

E./(r) n 9^i^)^ 

where / is any function from Xi x . . . x Xr to R, and each is an A-function supported 
in Ti. and taking values in [—1, 1]. By /(r) we mean f{xi, . . . , Xr), but for convenience we 
add in the other variables on which / does not depend. 

In order to shorten the statement of the next lemma, let us describe in advance a 
chain /C that appears in its conclusion. For each i ^ t we shall have a set Wi of the form 
{i} X Ui, where Ui is a finite subset of N. The chain /C will be an r-partite chain with vertex 
sets Fi, . . .,Fr, where Fj = Uie^;^ ^i- We shall use the vertices of /C to index variables 
as follows: the element (z, h) of Wi indexes a variable that we shall call x'^. When i ^ k 
the sets Ui will be chosen in such a way that (z, 0) and (z, 1) both belong to Uf. it will 
sometimes be convenient to use the alternative names Xi and x[ for x^ and xj. 

We shall use the letter uj to stand for the sequence of all variables xj, enumerated 
somehow. Equivalently, we can think of u) as an r-partite map from Fi U . . . U F^. to 
Xi U . . . U F^. 

Let a be shorthand for the sequence xi, x'j^, 0:2, 0:2, ■ ■ • , a^fc, ic^. Generalizing the no- 
tation from §2, if / : Xi x ... x Xr — M we shall write fa{^) for the expression 
rieeio i}*: /(^i^ • • • 5 ^fc*"' ^fc+i, • • • , Xr). Once again, u contains many more variables than 
the ones that appear in this expression, but since / does not depend on them the notation 
is unambiguous. (In fact, when we come to apply the lemma, / will not even depend on 

Lemma 4.1. Let the chains Ti. and J be as just described. Then there is a chain JC of 
the kind that has also just been described, with the following properties, 
(i) Every set in JC has cardinality less than k. 

(a) Let 7 : Fi U . . . U ^ Fi U . . . U Fy. be the r-partite map 1-^ i. (That is, 
for each z ^ 7 takes the elements ofWi to i.) Then 7 is a homomorphism from K, to J, 
and for each A e J of cardinality less than k there are precisely 2^ sets B e K, such that 
l{B) = A. 
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(iii) If f is any function from XiX. . .xX^ to M and each is an A-function supported 
in Ti and taking values in [—1,1], then we have the inequahty 

(E./(r) n a^ir))'^ < ^Mu^) n ^^(^) • 

A€J B€K 

Proof. We shall prove this result by induction. To do this we shall show that for each 
j < k the left-hand side can be bounded above by a quantity of the following form, which 
we shall write first and then interpret: 

Ec.,( n ^^K))(E.,/.,(r,) n (y^'Mr,) n iH^M^j)f \ 

The set system ICj here is a chain. Each vertex of K,j belongs to a set of the form 
{i} X Ul for some i and some finite subset t// of N. The vertices are partitioned into 
r sets E{, . . . ,El, where = IJ/ieSi ^h- before, stands for a variable indexed by 
the pair {h,q) G V^. In the back of our minds, we identify (i,0) with i when i ^ r: in 
particular, we shall sometimes write Xi instead of a;°, and if j ^ k we shall sometimes write 
[j] for the set {(1, 0), (2, 0), . . . , (j, 0)} rather than the more usual {1, 2, . . . , j}. We shall 
also sometimes write x'^ for x}. 

For the products in the second bracket we have not mentioned the condition A & J', 
which always applies. In other words, the products are over all sets A & J' that satisfy 
the conditions specified underneath the product signs. We write aj as shorthand for 
{xi, x[, . . . ,Xj, x'j). We also write tj for the sequence (iCj+i, . . . , Xf). We define the sets 
in such a way that V^^ is the singleton {(i, 0)} and is a subset of each V-^: it is only the 
first bracket that depends on the new variables. Finally, uj is an enumeration of all the 
variables that are not included in tj. 

We shall not specify what the edges of the chain )Cj are (though in principle it would 
be possible to specify them exactly), since all that concerns us is that the map 7 that 
takes [i, 0) to i is a homomorphism from ICj to such that, for each A E J oi cardinality 
less than k, the number of sets B e Kj with 7(5) = A is 2^ - 2'=-^+l^^[-^H if A ^ [j] and 
2fe -2l^l if A C [j]. 

Let us explain these last numbers. They are what we need for the inequality to be 
properly homogeneous in the way that we discussed in §2. To see why they are the correct 
numbers, let us think about a function of the form {H^)^.. = (-ff'^)a;i,x;,...,a;j,a;'. • For each 
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i ^ j such that i ^ A, there is no dependence of (H )^.(Tj) on Xi or x^, so in order 



for {H^)^.{Tj) not to be zero, the number of distinct sets that are required to befong 
to H is 2l'*'^t-'ll. When we raise to the power 2^~^ , this must happen 2^~^ times, all 
independently, except that if ^ C [j] then H"^ does not depend on any of the variables 
in Tj so it needs to happen just once. Thus, the number of sets required to be in Ti. is 
2fe-j2l^nb]| ^ 2k-j+\Am\ when A ^ [j], and it is 2\^^^^\ = 2l^l when A C [j]. This falls 
short of 2^^ and the difference must be made up for in the first bracket. 

Now that we have discussed the inductive hypothesis in detail, let us prove it by 
repeating once again the basic technique: isolate one variable and sum over it last, apply 
Cauchy-Schwarz carefully, expand out a square, rearrange, and apply Cauchy-Schwarz 
carefully again. 

As we did repeatedly in §2, we shall leave the first bracket and concentrate on the 
second. That is, we shall find an upper bound for 



Let us write tj as (a;j-|-i, Tj+i). The quantity above equals 



((• 



[j]cA bl(z:A 

^ ' |A|<fc 

Applying Cauchy-Schwarz, we find that this is at most the product of 



2n 2 



k-j-i 



2^-3- 



1 



\A\<k 



and 



2n 2^-^- 



fl + llC^ [j + lJCtA 

|A|<fe 

Before we continue, let us briefly see what principle was used when we decided how to 
apply Cauchy-Schwarz. The idea was to take all terms that did not depend on Xjj^\ out to 
the left of a;j+i, except that each time we took out a [g^^oj or an {E.-^)„. , we left an (H^)^. 
behind, exploiting the fact that {g^),.{H^),. = (g^)^. and {H^),.{H^),. = {H^).,. In 
this way, we extracted maximum information from the Cauchy-Schwarz inequality. 
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Since each is an ^-function supported in Ti, and it maps to [—1, 1], and since each 
takes values or 1, we will not decrease the first term in the product if we replace it 

by 

lj]CA [j]<tA 
\A\<k \A\<k 

which we can write more succinctly as 

■.' _1_ 1 A 



2k-j-i 



j + l^A 
|A|<fc 



To deal with the second term, we first have to expand out the square, which in our notation 
is rather simple: we obtain 



k-j-i 



\j+l]CA b'+iJtZA 

|A|<fe 

We now apply Holder's inequality. This time we take to the left of the expectation over 
Tj-i-i all terms that have no dependence on tj+i, again leaving behind the corresponding 
{H'^)crj_^_j^ terms as we do so. The one exception is that, for convenience only, we do not 
take the term {g'^)aj+i to the left when A= [j + 1], but instead take out {H'^)(j._^^ in this 
case. The result is that the last quantity is bounded above by the product of 

A<Z[j + l] 
\A\<k 

and 

2fe-i-i 



[j+l]CA [i+i](ZA 

^ ' \Al<k 



These calculations have given us the expression we started with, inside an expectation, 
with j replaced by j + 1. We must therefore check that we also have a chain JCj+i with 
the right properties. Looking back at the various brackets we have discarded, this tells us 
that we want to rewrite the expression 



2'=-J-i_l 

. . ^ . I 

\A\<k \A\<k 
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as 

for a chain JCj+i with properties analogous to those of fCj. 

There is a shght abuse of notation above, because after our apphcations of the Cauchy- 
Schwarz and Holder inequalities we have ended up overusing Tj+i, xj+i and a^j+i- But 
we can cure this by renaming the variables in the expression we wish to rewrite. Indeed, 
since we are raising the expectation over tj+i = {xj+2, ■ ■ ■ ,xt) to the power 2''~^~^, let 
us introduce 2''~^~^ new variables for each variable included in Tj+i. More precisely, let 
us choose a set U of cardinality 2''~^~^ that is disjoint from for every i between j + 1 
and t and replace = {i} x U- by {i} x {U^ U U). We can then expand out the second 
bracket as an expectation over the variables and a;^ with i ^ j + 2 and 

u E U of the product of all expressions of the form {H^)a-j (^/), where rj^ = {x'j_^_i, . . . , x^). 
(In fact, there is no dependence on x'j_^_i, but we add the variables anyway so that it looks 
slightly nicer.) 

In a similar way, we can expand out the third bracket and introduce a further 
2{2^~^~^ — 1) new variables into Vj^^i- When we do these expansions, we end up writing 
the expression in the desired form for some set-system /Cj+i. It is not hard to see that 
/Cj+i is a chain, so it remains to prove that it contains the right number of sets of each 
index. 

Let 7 be the usual projection (z, h) i. We need to prove that each set A G J" of 
cardinality less than k has exactly 2^ preimages under 7 in /Cj+i. We consider various 
cases. 

First, if A is a subset of [j], then ICj (which we can think of as a chain defined on 
the vertex sets of /Cj+i) already contains 2^ — 2l"^l preimages of A. Since the additional 
vertices (i,w) do not project into [j], we do not create any new preimages in /Cj+i. 

Now suppose that A is a subset of [j + 1] that contains j + 1. Then A <f_ [j] so the 
number of preimages of A in /Cj is 2^ — 2'^~-'+l"^'^[-']l. No new preimages come from the 
second bracket, since that involves only sets that do not include j + 1, while from the third 
bracket we obtain (2l^n[i+i]|)^2'=-^-i - 1) preimages. But i^-J-^MArAi+iW ^ 2''-^+\^^^^^\ 
in this case, so the total number of preimages is 2'^ — 2^^'^^^'^^^^ = 2'^ — 2^^^. 

Next, suppose that A [j + 1] and j + 1 e A. Then JCj contains 2^= - 2'=-J+l^^[-?]l 
preimages of A and the second and third brackets do not contribute any. Since k — j + 
\Ar)[j]\ = k-j-l + \An[j + l]\, the total number of preimages is 2^= - 2'=-i-i+l^n[i+i]| ^ 
as we want. 
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Finally, suppose that A [j + 1] and j + 1 ^ ^. In that case, /Cj contains 
2A;_2A;-i+|^n[j]| preimages, the third bracket contributes none, and the second bracket con- 
tributes i^^^^MW^^-i-'^ = 2'=--?-i+l^'^[-?]l preimages. Thus, the total number of preimages 
is 2^= - 2'=-J-i+l^n[j1l, which equals 2^= - 2'=-J-i+l^n[j+i]|_ 

This completes the proof of the inductive step. All that remains is the simple task of 
checking that the case j = k oi the induction is the statement that we wish to prove. But 
when j — k, we have the upper bound 

Ec..( n H^M){^rj.dn) n (9^)'^drk) n (H%drk)f 

A€)Ck [k]cA we:^ 

^ ' \A\<k 

The most obvious simplification is 1 for 2'^"'^. Since J does not contain the set [k], the 
first product in the second bracket disappears. This gives us the upper bound 

^.k,rkUkirk) n n (^^)-.(^fe)- 

AelCk \A\<k 

Writing u for (cjjt, r^) and letting /C be the union of the sets in JCk and the sets implied by 
the second product (we will say what these are in a moment), we can write this as 

^Mto) n ^^(^) 

Aeic 

as required. 

We still need to check that /C contains precisely 2'^ preimages of each set A E of 
cardinality less than k. Let us therefore be slightly more explicit about the "sets implied 
by the second product." A function {H^)a;^{Tk) is a product of functions of the form 
H^{xl^ J • • • J x'j!' , Tfc). But depends only on the variables in A, so the number of distinct 
functions in the product is 21^'^t^ll, and thus the number of preimages of ^ in /C that come 
from the second product is 2l^'~'l'^]L But when j = k, the number of preimages in JCk is 
2k _ 2l^n[fe]|^ whether or not ^ is a subset of [k]. Therefore, for each set C C {1, 2, . . . , r} 
of cardinality less than k, the chain /C contains precisely 2'' sets of index C for each set 
A e J of index C, as claimed. □ 

As we shall see in the next section, the fact that the sets in /C have cardinality at most 
k — 1 allows us to use Lemma 4.1 inside another induction (in fact, a double induction). 
This corresponds to the second part of §2, where we replaced functions such as G{x, t) by 
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constant functions Sxt- This time the functions we shall replace are functions of the form 
with Ae}C. 

§5. A counting lemma for quasirandom chains. 

Just before we prove our main result, we isolate a simple statement that is needed 
in the proof and that helps to explain some of our choices in the definition of (e, J', k)- 
quasirandom chains. For convenience, we briefiy recall the definition here. We constructed 
a sequence e^, e^-i, ■ ■ ■ ,€i by letting — e and 

-A G 
\A\^k-j+l 

when j ^ 1. We also defined r]k-j by the formula 

2fc(j + i) 

m-j^{l/2){ek-j n 

\A\^k-j 

for each j. Finally, we declared H to be (e, J, A;)-quasirandom if, for every A e J oi size 
j ^ k, the hypergraph H{A) was ?7j-quasirandom relative to H^{A). 

These parameters are chosen in order to satisfy some assumptions required in the 
inductive step of Theorem 5.2 below. The next lemma establishes that they do indeed 
satisfy them. 

Lemma 5.1. Let J and Ti he chains and suppose that Ti is (e, J', k) -quasirandom. Let K, 
he a chain with the same vertex set as that of J , and suppose that there is a homomorphism 
from K, to J such that each set in J has at most 2^ preimages. Let e^, e^-i, . . . , ei be the 
sequence defined above. Then 7i is (efc_i, )C,k — 1) -quasirandom. 

Proof. Let $ = ek-i and define a sequence Ok-i, Ok-2, • • • by taking 9k-i = and 

^fc_i_, = 2-^('=-i)-i|/cri(^fc_, J] 

Aeic 

\A\>k-i 

Suppose that 9k- j ^ ^k-j- We also know that \K,\~'^ ^ 2~^\J\~'^ and that 

n n ^^f- 

Aeic AeJ' 

\A\^k-j \A\^k-j 
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It follows that 

23k 



AEJ' 
\A\^k-j 



Therefore by induction 9j ^ Cj for every j. 

Now let j be an integer between and k — 1. Then 



fe(j+2) 



r]k-i-j = Vk-{j+i) = (l/2)(efc-o+i) JJ Sa) 

\M>k-{j + \) 

^(i/2)(^,_(,+i)( n ^^^) 

|A|^fc-(j+i) 

^(i/2)(^,_(,+i) n 

\^\>k-(j + l) 

^{i/2){0k-i-j n 



2'=0 + i) 



2('=-i)0 + i) 



Aeic 

\A\^k-l-j 



This is the formula for rjk-j except that k has been replaced by /c — 1, JT" by /C, and e^-j 
by 9k-i-j- It follows that ?i is (efc_i,/C, A; — l)-quasirandom, as claimed. □ 

In the next theorem and its proof, we shall discuss two chains J and and borrow 
notation from the previous section without redefining it. For example, r is once again 
a sequence {xi, . . . ,Xt) that enumerates variables that are indexed by the vertices of J^. 
Eventually, we will be interested in the case where every function g"^ is just H^, but this 
more general statement is needed for an inductive argument to work, and is also of some 
interest in its own right. 

Theorem 5.2. Let J and Ti he r-partite chains as described at the beginning of the 
previous section. Let J\ be a subchain of J and for each A & J'l let be an A-function 
supported in Ti. Suppose that the maximum cardinahty of any set in J\J\ is k and that 
Ti. is (e, J", k)-quasirandom. Then 

n ^^(^) n H\T)-^r n 9\r) n ^e|j\ jii n . 

AeJi AeJ\Ji AeJi AeJ\Ji AeJ 

Proof. This result tells us that we can replace the functions in the quantity 
riAej^i ^A£j\Ji H-^{t) by their relative densities 5a without changing the quan- 

tity by too much. This is proved by two levels of induction, for the following reason. First 
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of all, we do our replacements one by one, and this leads to an induction on the cardinality 
oi J However, in order to establish an upper bound for the error introduced when 

we make a replacement, we use our main lemma. Lemma 4.1, which results in a similar 
expression to the one we were initially trying to bound, but with new chains K and /Ci. 
These chains are considerably bigger than J and Ji^ but the largest set in /C\/Ci is smaller 
than the largest set in J" \ , so we can use induction on k to replace the error term itself 
by a quantity that will turn out to be small as a direct consequence of the quasirandomness 
of the chain V,. 

Let us therefore choose a maximal set Aq in J \ J\ and try to replace (r) by 
in the expression E^- H^ej-i 9^^'^) ^AeJ\J\ while introducing only a small er- 

ror. Letting Jq — J \ {^o}: the difference between the original expression and the new 
expression is 

]E./(r) n 9\r) \[ H\t), 
AeJi AeJo\Ji 

where / is the ^o-function defined by /(r) = {H'^°{t)-5ao) HacAq ^^i^)- (This function 
was first defined near the end of subsection 3.6: in the notation of this section it equals 
1 - 5ao if ''"(^o) e -^"(^o), -Saq if ''"(^o) e H^{Aq) \ Haq, and zero otherwise.) 

Without loss of generality, we may assume that is the set {1, 2, . . . , A;}. Let us 
therefore apply Lemma 4.1 to this function / and to the chain Jq. It yields for us an 
r-partite (k — l)-chain /C' and a homomorphism 7 from K.' to Jo such that every set in J'q 
of cardinality less than k has 2^^ preimages, and such that we have the inequality 

(E./(r) n g'^ir) n H\T)y\E^U{u;) JJ H^{^). 

Recall that /(cr) is the product of f{uj{A)) over all sets A of the form 
{(1, ei), . . . , (A;, e/o)}. Let /Ci be the chain of all subsets of such sets and let /C = /Ci U K,' . 
Then the largest set in /C \ /Ci has size at most k — 1. Moreover, by Lemma 5.1, Ti. is 
(efc_i,/C, A; — l)-quasirandom. Therefore, by induction on k, we know that the right-hand 
side of the above inequality differs from W,„fa- WAeK\Ki ^a by at most efc_i|/C\/Ci| ITAex; ^a- 

This is at most e^-i I AT \ /Ci I Ff 4(=r/ ^A, which is equal to ek-i 1^ \ ^1 1 ( 11 -^^^o ^A ) • 

\ \A\<k / 

But |/C \ /Cil ^ |/C| < 2^\J\ and 1^^^eu-\\J\ ^ ^a] , so this is at most 

{^m{^kY{A^j5A) . 
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As for KfaIlA€K\lCi^A, it is equal (by definition) to Oct{f)ll^^,^^,^^SA. By 
hypothesis, / is r/fc-quasirandom, which means that Oct(/) ^ VkYlAeiCi Since 

Vk ^ (l/2)(efcn -^ej- 6a] , it follows that 

^afa n ^ n ^ n ^ (V2)(efc n • 

Aeic\iCi AeK. AeK.' AeJ 

Putting these two estimates together, we find that 

Er/(r) n 9^^^) n H^^^)\^^k n ^a. 

AeJi AeJo\Ji AeJ 

Thus, returning to the beginning of the proof, we have shown that replacing by 
5a for any maximal element oi J\Ji results in an error of at most e/c Hyie^ ^a- Therefore 
the result follows by induction on \ J\Ji\ and the triangle inequality (and the fact that 
ek = e). □ 

If we now consider the case when J'l is empty, then we obtain the following corollary, 
which is the counting lemma that we have been aiming for. 

Corollary 5.3. Let J and Ti he r-partite chains with vertex sets Ei U . . . U Er and 
Xi U . . . U Xr, respectively. Let k be the size of the largest set in J and suppose that 
Ti is (e/lJ/"!, JT", k)-quasirandom. Let r be a random r-partite map from Ei U . . . U E^ to 
XiU...UXr. Then 

[t e Hom(j,7^)] - n ^^1 ^ ^ n • ° 

AeJ AeJ 

In less precise terms, this says that if is a small r-partite chain and is a sufficiently 
quasirandom r-partite chain, then a random r-partite map from the vertices of to the 
vertices of Ti. will be a homomorphism with approximately the probability that you would 
expect if 7i was a random chain with the given relative densities. 

6. Local increases in mean-square density. 

All known proofs of Szemeredi's theorem use (explicitly or implicitly) an approach of 
the following kind. Given a dense set that fails to be quasirandom in some appropriate 
sense, one can identify imbalances in the set that allow one to divide it into pieces that 
"improve" in some way, on average at least, on the set itself. One then iterates this 
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argument until one reaches sets that are quasirandom. At that point one uses some kind 
of counting lemma to prove that they contain an arithmetic progression of length k. 

This proof is no exception. We have defined a notion of quasirandomness and proved a 
counting lemma for it. Now we must see what happens when some parts of a chain are not 
relatively quasirandom. We shall end up proving a regularity lemma, which says, roughly 
speaking, that any dense chain can be divided up into a bounded number of pieces, almost 
all of which are quasirandom. This generalizes Szemeredi's regularity lemma for graphs 
(which formed part of his proof of his theorem on arithmetic progressions). 

Given a dense graph G and a positive real number e, Szemeredi's regularity lemma 
asserts that the vertices of G can be partitioned into K classes of roughly equal size, with 
K bounded above by a function of e only, in such a way that, proportionately speaking, 
at least 1 — e of the bipartite graphs spanned by two of these classes are e-regular. (One 
can insist that K is much bigger than e~^, so it is not necessary to worry about the case 
where the two classes are equal. Or it can be neater to say that two equal classes form a 
"regular pair" if they span a quasirandom graph.) 

Very roughly, the proof is as follows. Suppose you have a graph G and a partition of 
its vertex set. Then either this partition will do or there are many pairs of cells from the 
partition that give rise to induced bipartite subgraphs of G that are not e-quasirandom. 
If X and Y are two disjoint sets of vertices, write G(X, Y) for the corresponding induced 
bipartite subgraph of G. Suppose that X and Y are two cells of the partition, for which 
G{X, Y) is not e-regular. Then there are large subsets X{0) C X and Y{0) C Y for which 
the density of G{X{0),Y{0)) is substantially different from that of G{X,Y). Letting 
X{1) = ^ \ ^(0) and Y{1) = Y \ Y{0), we have obtained partitions of X and Y into two 
sets each, in such a way that the densities of the graphs G{X{i), Y{j)) are not almost all 
approximately the same as that of G{X, Y). One can then define an appropriately weighted 
average of the squares of these four densities and show that this average is greater than 
the square of the density of G{X, Y). Let us call this stage one of the argument, the stage 
where we identify a "local" increase in mean-square density. 

It remains to turn these local increases into a global increase. This, which we shall 
call stage two, is quite simple. Denote the cells of the original partition by Xi, . . . , Xj.. For 
each pair {Xi,Xj) that fails to be e-regular, use the above argument to partition Xi into 
two sets Xij{0) and Xjj(l), and to partition Xj into two sets Xji{0) and Xji{l). Then for 
each i find a partition of Xi that refines all the partitions {Xy (0), Xy(l)}. The result is a 
partition into m ^ sets Yi, . . Y^ that refines the partition {X^, . . . , X^}. It can be 
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shown that the average of the squares of the densities G{Yi,Yj), again, with appropriate 
weights, is significantly greater than it was for the partition {Xi, . . . ,Xk}. Therefore, if 
one iterates the procedure, the iteration must terminate after a number of steps that can 
be bounded in terms of e. It can terminate only if almost all the graphs G{Xi,Xj) are 
quasirandom, so the result is proved. 

We have given this sketch since our generalized regularity lemma will be proved in 
a similar way. There are two main differences. First, it is an unfortunate fact of life 
that, when one is dealing with /c-chains rather than graphs, simple arguments have to be 
expressed in terminology that can obscure their simplicity. For example, even defining the 
appropriate notion of a "partition" of a chain is somewhat complicated. Thus, stage two 
of our argument, although it is an "obvious" generalization of stage two of the proof of the 
usual regularity lemma, is noticeably more complicated to write down. 

A more fundamental difi^erence, however, is that our stage one is not completely 
straightforward, and here the difference is mathematical rather than merely notational. 
The reason is that we do not generalize Szemeredi's regularity lemma as it is stated above, 
but rather a simple variant of it where rather than obtaining e-regular pairs we obtain 
e- quasirandom pairs. For dense bipartite graphs, these two notions are equivalent (give 
or take changes in e), but when one generalizes them to hypergraphs that live in sparse 
chains they diverge in a significant way. Some hint of this can already be seen above. It 
is true by definition that if a pair Y) is not e-regular, then there are large subsets 

X(0) C X and Y{0) C Y for which the density of G(X(0), Y{0)) is substantially different 
from that of G{X, Y). However, if we assume instead that G{X, Y) is not e-quasirandom, 
then there is something to prove. The proof is very simple in the dense case, and even 
in the sparse case, but in the latter it yields sets X{0) and 1^(0) that are very small. As 
a result, we have to work significantly harder in order to obtain a partition with a good 
enough local increase in mean-square density. Roughly speaking, our approach will be to 
find many pairs of such sets, and build a partition out of those. For this to work it is 
important that the pairs are sufficiently spread out: the detailed argument will occupy the 
rest of the section. 

Incidentally, the last paragraph describes the main difference between our approach 
and that of Nagle, Rodl, Schacht and Skokan. Their definitions generalize that of e- 
regularity of bipartite graphs, so stage 1 of the proof of the regularity lemma is easier 
for them. However, they have to pay for this when they prove their counting lemma: 
e-regularity is a weaker property than e-quasirandomness, so if you use it as your basic 
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definition then it is easier to deduce facts about objects that are not e-regular but harder 
to deduce facts about objects that are e-regular. 

We shall now work towards our stage one, which will be Lemma 6.3 below. To begin 
with, let us say what we mean by the mean-square density of a function with respect to 
a partition. Let U he a set of size n, let f : U ^ R and let Bi, . . .,Br he sets that 
form a partition of U. Then the mean-square density of / with respect to the partition 
{Bi,...,Br} is 

i=l 

If we write Pi for \Bi\/n (which it is helpful to think of as the probability that a random 
X E U is an element of Bi) and di for Kjc(zBif{x) (that is, the expectation, or "density", 
of f in Bi) then this sum is X]i=i f^i^h weighted average of the squared densities df, 
with respect to the obvious system of weights Pi. 

The following two simple lemmas are very slight modifications of lemmas in [G2] . The 
first is our main tool, while the second is more of a technical trick that will be used in 
Lemma 6.3. 

Lemma 6.1. Let U be a finite set and let f and g he functions from U to the interval 
[— 1, 1]. Let Bi, . . .,Br be a partition of U and suppose that g is constant on each Bi. 
Then the mean-square density of f with respect to the partition Bi, . . .,Br is at least 
{f,9)V\\9\\l 

Proof. For each j let aj be the value taken by g on the set Bj. Then, by the Cauchy- 
Schwarz inequality, 

j 

The first part of the product is \\g\\2 and the second is the mean-square density of /, from 
which the lemma follows. □ 

In the next lemma, E^f j and EjWj mean the obvious thing: they are Yl^=i '^i and 
Sr=i respectively. 

Lemma 6.2. Let n be a positive integer, let < 5 < 1 and let r be an integer greater 
than or equal to 5~^. Let vi, . . .,Vn be vectors in a Hilbert space such that ^ 1 
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for each i and such that ||EjVj|| ^ 6. Let r vectors wi, . . . ,Wr be chosen uniformly and 
independently from the Vi. (To be precise, for each Wj an index i is chosen randomly 
between 1 and n and Wj is set equal to Vi.) Then the expectation of \\^jWj \\ is at most 26. 

Proof. The expectation of ||Ejt(;j|| is the expectation of E>ij{wi,Wj). li i ^ j then the 
expectation of (wj, Wj) is ||Eifi||^ which, by hypothesis, is at most 6. If i = j, then {wi,Wj) 
is at most 1, again by hypothesis. Therefore, the expectation we are trying to bound is at 
most r~^((5r(r — 1) + r). Since 6r ^ 1, this is at most 26, as claimed. □ 

Before we state the main result of this section, we need two definitions. The first 
is of a chain T> that we shall call a double octahedron. We use this name for conciseness 
even though it is slightly misleading: in fact, T> is the {k — l)-skeleton of a chain formed 
from two A;-dimensional octahedra by identifying a face from one with the corresponding 
face from the other. To put this more formally, take the vertex set of T> to be the set 
[k] X {0, 1, 2}. For each i between 1 and k let Vi be the set {i} x {0, 1, 2} and for j = 0, 1, 2 
let Bj be the set [k] x {j}. The edges of T> are all sets B of cardinality at most k — 1 such 
that |5 n Fj| ^ 1 for every i and at least one of 5 fl Bi and B (1 B2 is empty. (The two 
octahedra in question are Oi and O2, where Oj consists of all sets B C BqU Bj such that 
\B n Fj| ^ 1 for every i.) 

Notice that if ^ C [k] is a set of size at most k — 1 then the number of edges in T> 
of index A is 21^1+^ - 1, since there are 2' ' edges from each octahedron and one, namely 
A X {0}, which is common to both. 

For the second definition, suppose we have a /c-partite (/c — l)-chain Ti with vertex sets 
Xi, . . . , Xk- Recall from §2 that H^{[k]) is the collection of all sets A such that lAflX^ | = 1 
for every i and such that every proper subset of A belongs to Ti. For this second condition 
to hold it is enough for C to be an edge of Ti. whenever C G A and \C\ = k — 1. Let H 
be the /c-partite {k — l)-uniform hypergraph consisting of all edges of 7i of size k — 1. For 
1 ^ i ^ /c let i^i be the {k — l)-partite subhypergraph of H consisting of all edges of H 
that have empty intersection with Xi. We shall call the hypergraphs Hi the parts of H. 
Each set A e H^{[k]) has k subsets of size k — 1. Each part Hi of H contains exactly one 
of these subsets, namely A\Xi. 

Suppose that each Hi is partitioned into subhypergraphs Hn, . . . , i^irj- These parti- 
tions give rise to an equivalence relation ~ on H^{[k]): we say that A A' if, for each 
i ^ k, the sets A\Xi and A' \ Xi belong to the same cell Hij of the partition of Hi. The 
corresponding partition will be called the induced partition of H^{[k]). 
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Lemma 6.3. Let H be a k-partite {k — l)-chain with vertex sets Xi, . . . , X^, let V he the 
double octahedron, let 5 = YIa^v '^^ ^ ^ positive integer. Suppose that 

e ^ I'Z^I""'^, that H is (e, T>,k— l)-quasirandom and that f : H^{[k]) — > [—1, 1] is a function 
that is not -q-quasirandom relative to H. Let H he the set of all edges of H of size k — 1 
and let Hi, . . . , Hk be the k parts of H. Then there are partitions of the Hi into at most 
3^ sets each such that the mean-square density of f with respect to the induced partition 
ofH^{[k]) is at least r]^/32. 

We shall prove Lemma 6.3 in stages, by means of some intermediate lemmas (Lemmas 
6.4-6.7 below). Since these lemmas form part of a larger proof, we shall not state each one 
in full: rather, if we have already introduced notation such as names for various functions 
we shall feel to use it again without redefining it. 

But before we get on to the subsidiary lemmas, let us examine our main hypothesis, 
that / is not ?7-quasirandom relative to H. For each i ^ k let Ui — {i} x {0, 1} (so Ui 
consists of the "first two" of the three elements of Vi). As in §3, let B be the A;-partite 
A;-uniform hypergraph consisting of all sets B C Ui Li .. .UUk such that \B r\Ui\ = 1 for 
every z, let /C be the chain of all sets C that are proper subsets of some BeB and let 0, 
be the set of all A;-partite maps from UiU . . .UUk toXiU...U Xk- Then to say that / is 
not ?7-quasirandom relative to Ti. is to say that 

Oct(/) = E^en H /''('^) > ^ 11 ' 
BeB Ae/c 

where by f^{uj) we mean f{uj{B)) if uj{B) G H^{\k]) and otherwise. 

Let So and Bi be as defined earlier, so that t/i U . . . U t/^ = BqVJ Bi. Let $ and \1/ be 
the set of all /c-partite maps from So and respectively, to Xi U . . . U X^- There is an 
obvious one-to-one correspondence between and $ x ^: given any a; G O, associate with 
it the pair (0, i/^) where and i/' are the restrictions of u to Bq and Bi. This procedure is 
invertible: given a pair (</>,'(/') ? define a /c-partite map u by setting ui^x) = (j){x) if x G So 
and u!{x) — '^(a;) if a; G Bi. Prom now on we shall identify Q with $ x W and freely pass 
from one to the other. 

Let us split the product rises f^i^) ^^^^ parts. We shall write F{uj) for f^"{uj) 
and G{uj) for YIbeB Bj^Bq if c<j = (</>,'(/') then F{uj) does not depend on if; 

(since it depends only on a;(So) = (^(Sq)). To emphasize this, we shall write G{(j),ip) for 
G{(jij) and -F(0) for F{(jj). Our hypothesis now becomes 

E^e^E^e^ F{4>)G{4>, i^) > V H ■ (*) 

AeK. 
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Let us see why this is useful. First, note that there is another obvious one-to-one 
correspondence, this time between $ and Xi x . . . x X^. It associates with a map e ^» 
the A;-tuple (0(1, 0), . . . , 0(A;, 0)), and the inverse associates with a A;-tuple (xi, . . . , Xk) £ 
HiLi the map : 5o — > Xi U . . . U that takes (z, 0) to Xi for each i ^ k. Therefore, 
the function F is basically another way of thinking about /. The inequality above can 
be regarded as saying that, for an average ip e F has a certain correlation with the 
function : (p i-^ G{(j),tl)). This is significant, because the functions have a special 
form, as the next lemma shows. 

Lemma 6.4. Each function : $ [—1, 1] defined above can be written as a product 
of A-functions over sets A C Bq of size k — 1. 

Proof. By definition, Gv(0) = IlBeB,B^Bo (^^'^)- f^{4>^'^) depends on 

{(j),^)(B) = 4){B n Bq) U ip{B n Bi) only. Therefore, if V is fixed, /^(0,V') depends 
on 0(S n So) only. Thus, the function cj) i— > /^{(j)., ip) is a (S fl i?o)-function defined on 
Since B ^ Bq, \B r\ Bq\ ^ k — 1. This proves that G^ is a product of ^-functions over 
sets A of size at most k — 1. However, if 5 C ^, then the product of a 5-function with 
an A-function is still an A-function. From this simple observation it now follows that G^ 
is a product of A-functions over sets A of size equal to /c — 1. □ 

Our next task is to construct some new functions out of the G^ that have very 
similar properties but take values 0, 1 and —1 only. 

Lemma 6.5. If tiie inequality (*) holds, then there exist functions E^ : $ { — 1, 0, 1}, 
one for each G with the following properties. First, E^{(f)) is non-zero only if {(f), ip) e 
Hom(/C, Ti). Second, each E^ can be written as a product of {—1, 0, l}-yaiued A-functions 
over subsets A C Bq of size k — 1. Third, 

E^^^E^^^F{(P)E^{4>) >r]l[dA. 

AeK 

Proof. Let us fix i/j G \E' and consider the function G = G^. By Lemma 6.4 we can write 
it as a product of A-functions, where each A in the product is a subset of Bq of size k — 1. 
There are k such sets, namely Ai, . . . , A/-, where for each i we set Ai = Bo\ {(i,0)}. So 
we can write G{(f)) = Il^^i 9i{4') with gi an Aj-function for each i. 

Now define an A^-function ttj : $ — > {— 1,0,1} randomly in the following natural way. 
Say that two maps and 0' are equivalent if 4>{Ai) = 4)'{Ai) and choose one map from 
each equivalence class. Let be one of these representatives. If gi{(f)) ^ then let Ui{(f)) 
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equal 1 with probability gi{(f)) and with probability 1 — gi{(f)). If gi{(f)) < then let Ui{(f)) 
equal -1 with probability —gi{(f)) and with probability 1 + gi{(f))- Then the expectation 
of Ui{(f)) is If (f)' is equivalent to then let Ui{(j)') = Ui{(j)). 

Do the same for each equivalence class and make all the random choices independently. 
Finally, for each e $ let E^{4>) = Y^^^^ Ui{4>). 

Now E^{(j)) can be non-zero only if Ui{(t)) ^ for every z, and this is the case (with 
probability 1) only if (7^(0) 7^ for every i, and hence only if G(0) ^ 0. We defined G{(j)) 



to be G^{ci>) = nBes,B/B„ But /^(0,V') = unless ((/>,^)(i?) G and 



this is true only if ((/>, V')(C') G 7f for every proper subset C of 5. Therefore this product 
is non-zero only if ((/>, ip) is a homomorphism from K, to H. 

Since the choices of the different functions Ui were made independently and the expec- 
tation of Uii^cj)) is Qiicj)), the expectation of ui{(j)) . . . Uk{4>) is gi{(j)) ■ ■ ■ gk{4') = There- 
fore, by linearity of expectation, the expectation of E0E^F((^)i?^(0) is E^E^F(0)G^(0), 
which we have assumed to be at least ^yJlyie/C follows that we can choose functions 

with the desired properties. □ 

Lemma 6.6. For each ip e let E^ be the function constructed in Lemma 6.5, and let 
T> he the double octahedron chain introduced before the statement of Lemma 6.3. Then 



By Lemma 6.5, E^-^{(p)E^^{(p) is non-zero if and only if {(p.ijji) and '(/'2) belong to 
Hom(/C, H). Therefore, this sum is at most the probability, for a random triple (0, ipi, '^2) G 
$ X that both (</>, V'l) and (</>, '^2) belong to Hom(/C, 

In order to estimate this probability, we shall apply the counting lemma to the chain 
V. Every edge of "D is a proper subset of either Bq U Bi or BqU B2. Let /Ci be the set of 
all edges of the first kind and let /C2 be the set of all edges of the second kind. Both /Ci 
and /C2 are chains and they intersect in a chain that consists of all proper subsets of Bq. 
Moreover, /Ci is essentially the same chain as /C (formally, it has different vertex sets but 
the edges are the same). As for /C2, it is isomorphic to /C in the following sense. Let 7 be 
the bijection from Bq U B2 to Bq U Bi that takes (z, 0) to (z, 0) and (z, 2) to (z, 1). Then A 
is an edge of /C2 if and only if 7(^) is an edge of JC. 




Aet) 



Proof. The left-hand side of the inequality we wish to prove can be rewritten 



E<^G4>E^i,i/'2e*-^V'i(^)-^i/'2(0) • 
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Let be the set of all A;-partite functions from U . . . U T4 (the vertex set of V) to 
XiL) . . .L) Xk- There is a one-to-one correspondence between and $ x x taking 
^ e to (0, V'l, V'2 o 7), where 0, ipi and ip^ are the restrictions of 9 to Bq, Bi and ^2, 
respectively. Since X> = /Ci U /C2, a map ^ e belongs to Hom(X', Ti) if and only if (0, ij^i) 
belongs to Hom(/Ci,7i) and (0, V'2) belongs to Hom(/C2, Ti). But this is true if and only if 
(0, V'l) and (0, il)2 o 7) belong to Hom(/C, H). (Note that ^^2 o 7 here is the ^^2 in the sum 
that we are estimating.) 

What this shows is that the probability that we wish to estimate is equal to the 
probability that a random G © is a homomorphism from T) ioTi. Since we are assuming 
that Ti is (e, — l)-quasirandom and that e ^ |^^|~^, the counting lemma (Corollary 
5.2) implies that this is at most 2]^^^^ 5^ = 2]^^^^ 5^, which proves the lemma. □ 

Our next task is to show that we can make a small selection of the functions and 
keep properties similar to those proved in the last two lemmas. The selection will be done 
in the obvious way: randomly. 

Lemma 6.7. Let 5 = YIa^v^a, let (3 = YlAeic^^ ^^'^ ^ ^ positive inte- 

ger. Then there exist functions Ei, . . .,Er from $ to{— 1,0,1} with the following three 
properties. 

(i) Each function Ei is a product of { — 1, 0, l}-vaiued A-functions over subsets A C Bq 
of size k — 1. 

(ii) For each i and each ^ G $, Ei{(j)) is non-zero only if 4>{Bo) G H^{[k]). 

(iii) E-^,E^e^F{cP)E,{cP) ^ (77/2)/3. 

(iv) E^e^(El^,E,{cl>)y ^ {SS/rip)EUE^^^F{cf>)E,{cl>). 

Proof. For each i let Ei be one of the functions E-^, where ip is chosen uniformly at random 
from W. Let the choices be independent (so, in particular, the E^ are not necessarily 
distinct, though they probably will be). Then it follows from Lemma 6.5 that property (i) 
holds, and also that the expectation of EJ"^]^E^g$F((/))£'i((/)) is at least r}(3. 

We now want to estimate the expectation of E^£$ (E'^^iEi((/})^ , and for this we shall 

use Lemma 6.2, the technical lemma from the beginning of the section. Set n = l^'l = 

|$| and let the vectors Vi,...,Vn be the functions E^, which we regard as elements of 

II 1 1 2 

L2($). Lemma 6.6 tells us that ||E^^j^t)j||2 ^ 2S. Therefore, Lemma 6.2 tells us that the 

expectation of ||E[^j^£'j||2, which is the same as the expectation of E<^g$ ^E^^j^£^j(0)^ , is 
at most 4:5. 
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It follows that the expectation of 

is at least 8r](3d — 4r](3d = 4r]Pd. It follows that there must be some choice of the functions 
El, . . . , Er such that the inequalities (iii) and (iv) are satisfied. 

Since each Ei is one of the functions E^, Lemma 6.5 implies that Ei{(p) is non-zero 
only if ((/), i/j) G Hom(/C, H) for some -i/' G But a necessary condition for this is that 
0(-Bo) G H^{[k]), so property (ii) is true as well. □ 

Proof of Lemma 6.3. For each i let us write Ei as a product 11^=1 ^ij^ where Eij is a 
{— 1, 0, l}-valued Aj-function. (As in the proof of Lemma 6.5, Aj is the set So \ {{j, 0)}.) 

For each j ^ k we can partition the part Hj of H into at most 3^ sets, such that 
on each of these sets the function Eij is constant for every i ^ r. Let Zi, . . . , Zn be 
the corresponding induced partition of H^{[k]). (This concept was defined just before the 
statement of Lemma 6.3.) Then every function Ei is constant on every cell Zj, from which 
it follows that the function g{(p) = E,l^iEi{(p) is constant on every cell Zj. (Here we are 
implicitly thinking of (7 as a function of (P{Bq) and therefore defined on H^{[k]).) 

With the help of Lemma 6.7, we are now in a position to apply Lemma 6.1. Property 
(iii) of Lemma 6.7 tells us that {F,g) ^ (77/2)/?, and property (iv) tells us that (-F,5')/||5'||2 ^ 
r)/3/86. 

Let U be the set of all G $ such that 4>{Bo) G H^{[k]). Then the map (p t-^ (piBo) is 
a bijection between U and H^{[k]), so we can regard Zi, . . . , Z^ as a partition of U , and 
we can also regard F and g as functions defined on XJ . If we do so, then their L2-norms 
and inner products change: now we have {F,g) ^ {r]/2)f3/C„ where C, is the density of U in 
while the ratio (-P", ^)/||^||2 remains the same at ^ r]P/85. 

Lemma 6.1 and these estimates tell us that the mean-square density of F with respect 
to this partition of U is at least {r]P/2Q{r]P/85) = rf 0^ By Lemma 5.2 (the counting 
lemma), C ^ 2 IIacbo ^'^^^^^ ^^at every set AC. Bq'is the index of precisely 2l^l''"-^ — 1 
sets in V and 2l^l sets in /C. It follows that (3"^ = ^OacBo ^ '^^Z^- Therefore, the 
mean-square density of F with respect to the partition Zi, . . . , Zn is at least ■q'^/32. Since 
F(0) = /(0(5o)), this statement is equivalent to the statement of Lemma 6.3. □ 

Corollary 6.8. Let H be a k-partite {k — l)-chain with vertex sets Xi, . . . , Xk, let V he 
the double octahedron, let 5 = IIagp and let r ^ be a positive integer. Suppose 
that e ^ \T^\~'^ and that H is (e, T>,k — l)-quasirandom. Let be a k-partite k-uniform 
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hypergraph with vertex sets Xi, . . . , X^, let the density of relative to H (that is, the 
quantity \H''\/\H^{[k])) be 5^^] and suppose that is not r]-quasirandom relative to H. 
Let H be the set of all edges of H of size k — 1 and let Hi, . . . , Hk be the k parts of H. Then 
there are partitions of the into at most 3'' sets each such that the mean-square density 
of (the characteristic function of) with respect to the induced partition of H^{[k]) is 
at least + 77^/32. 

Proof. Let / : H^{[k]) [— 1, 1] be the function H'^ — 5\j^y Then the statement that 
is not ?7-quasirandom relative to Ti is, by definition, the statement that / is not ry- 
quasirandom relative to Ti. Therefore, by Lemma 6.3, we can find partitions of the required 
kind for which the mean-square density of / with respect to the induced partition of H^{\k]) 
is at least ?7^/32. 

Let Zi, . . . , Zjv be the induced partition of H^{[k]) and for each (a;i, . . . , Xk) £ Zi let 
G{xi, . . . ,Xk) = \H'' n Zi\/\Zi\. Then the mean of G is the same as the mean of H^, 
namely 5[k]. The value that G takes in Zi can also be written as 5[k] + Mx^Zifix), so the 
expectation of {G — (^[/c])^, which is also the mean-square density of G — 5[k] (since G is 
constant on the cells Zi) is the mean-square density of /. But it is also the variance of G, 
so by the usual formula varX = EX^ — (EX)^ we find that the mean-square density of 
G is plus the mean-square density of /. (Here we have again used the fact that G is 
constant on cells, so that the mean-square density of G is just EG^.) The result follows. □ 

§7. The statement of a regularity lemma for r-partite chains. 

Corollary 6.8 is stage one of the proof of our regularity lemma. In this short section 
we will introduce some definitions and state the regularity lemma itself. The proof (or 
rather, stage two of the proof) will be given in §9. 

Broadly speaking, the result says that we can take a /c-uniform hypergraph H, regard 
it as a chain (by adding all subsets of edges of H) and decompose that chain into subchains 
almost all of which are quasirandom. This is a useful thing to do, because Corollary 5.2 
gives us a good understanding of quasirandom chains. Thus, the regularity lemma and 
counting lemma combine to allow us to decompose any (dense) /c-uniform hypergraph 
into pieces that we can control. In the final section of the paper we shall exploit this by 
proving a generalization of Theorems 1.3 and 1.6 to A;-uniform hypergraphs, which implies 
the multidimensional Szemeredi theorem. 

Our principal aim will be to understand a certain (A;+l)-partite A;-uniform hypergraph. 
However, for the purposes of formulating a suitable inductive hypothesis it is helpful to 
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prove a result that is more general in two ways. First of all, we shall look at r-partite 
A;-uniform hypergraphs. Secondly, rather than looking at single hypergraphs we shall 
look at partitions. To be precise, let Xi, . . . ,Xr be a sequence of finite sets. Given any 
subset Ac. [r], A = {ii, . . . ,is}, let K{A) be the complete s-uniform hypergraph on the 
sets Xj^, . . . , Xi^, that is, the hypergraph consisting of all subsets of Xi U . . . U X^- that 
intersect Xi in a singleton if i e A and are disjoint from Xi otherwise. For each s ^ r, 
the complete s-uniform hypergraph Kg{Xi, . . . , X^) on the sets Xi, . . . , X^. is the union of 
the hypergraphs K{A) over all sets A C [r] of size s. Finally, the complete k-chain on 
Xi, . . . , Xr, denoted /Cfe(Xi, . . . , X^), is the union of all K{A) such that A has cardinality 
at most k: that is, it consists of all subsets of Xi U . . . UXj. of size at most k that intersect 
each Xi at most once. 

To form an arbitrary r-partite s-uniform hypergraph H with vertex sets Xi, . . . , X^,, 
one can choose, for each A C [r] of size s, a subset H{A) C K{A) and let H be the union of 
these hypergraphs H{A). If we want to, we can regard each H{A) as a partition of K{A) 
into the two sets H{A) and K{A) \ H{A). Our regularity lemma will be concerned with 
more general partitions, but it will imply a result for hypergraphs as an easy corollary. 

Suppose now that for every subset A C [r] of size at most k we have a partition of the 
hypergraph K{A). If B and B' are two edges of this hypergraph (that is, if they are two 
sets of index A), let us write B ~a B' if B and B' lie in the same cell of the partition, 
and say that B and B' are A- equivalent. 

One can use these equivalence relations to define finer ones as follows. Given two 
sets -B, B' of index A and given any subset C C A, there are unique subsets D (Z B and 
D' C B' of index C. Let us say that B and B' are C- equivalent if D and D' are. Then 
let us say that B and B' are strongly equivalent if they are C-equivalent for every subset 
C C A. In other words, we ask not only for B to belong to the same cell B', but also 
for every subset of B to belong to the same cell as the corresponding subset of B' in the 
corresponding partition. 

Given this system of equivalence relations, we can define a collection of chains as 
follows. For every r-tuple x = (xi, . . . , Xr) G Xi x . . . x X^ and every set A of size at most 
k, let x{A) be the set {xi : i E A} and let H{A, x) be the hypergraph consisting of all sets 
B that are strongly equivalent to x{A). 

Lemma 7.1. The union Ti ~ T-C{x) of the hypergraphs H{A,x) over all sets A of size at 
most k is an r-partite k-chain. 

Proof. Let B e H{A,x) and let D C B. Let C be the index of D. Since B is strongly 
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equivalent to x{A), D is strongly equivalent to x{C). Therefore D e H{C,x) and the 
lemma is proved. □ 



Lemma 7.2. Let x = {xi, . . . , Xr) and y = {yi, . . . ,yr) belong to the set Xi x . . . x Xr 

and let T-C{x) and H{y) be the two chains constructed as in Lemma 7.1. Then for every 
set A C [r] of size at most k, the hypergraphs H{A,x) and H{A,y) are either equal or 
disjoint. 

Proof. Suppose that 5 is a set of index A and that B e H{A,x) fl H{A,y). Then B 
is strongly equivalent to both x{A) and y{A), so these two sets are strongly equivalent to 
each other. It follows that H{A, x) = H{A, y). □ 

Let us call two r-partite A;-chains Ti, and li,' with the same vertex sets Xi , . . . , 
compatible if, for every subset A C [r] of size at most A;, the hypergraphs H{A) and H'{A) 
are either equal or disjoint. By a chain decomposition of the complete r-partite A;-chain 
/Cfc(Xi, . . . , Xr) we mean a set {Tii, . . . , V-n} of r-partite A;-chains with the following two 
properties: 

(i) for every i and j the chains Tii and Tij are compatible; 

(ii) for every sequence x = (xi, . . . , Xr) G Xi x . . . x Xj. there is precisely one chain 
from the set {Hi, . . . , Hn} that contains every subset of {xi, . . . , Xr} of size at 
most k. 

Note that a chain decomposition is not a partition of /C/s(Xi, . . . There is no in- 

teresting way to partition ]C}.{Xi, . . . , X^) into subchains, as a moment's thought will 
reveal. Lemmas 7.1 and 7.2 show that the chains H{x) form a chain decomposition of 
/C/s(Xi, . . . ,Xj.). (It may be that H{x) = H{y), but this does not contradict (ii) because 
we have carefully defined a chain decomposition to be a set of chains rather than a sequence 
of chains.) 

We are now ready to state our regularity lemma. 

Theorem 7.3. Let J be an r-partite k-chain with vertex sets Ei, . . . , Er and let < e ^ 
\J'\~'^. Let Xi, . . .,Xr be a sequence of finite sets and for each subset A C [r] of size at most 
k let V{A) be a partition of the hypergraph K{A) into ua sets. Then there are refinements 
Q{A) of the partitions V{A) leading to a chain decomposition of /Cfc(Xi, . . . , Xj.) with the 
following property: if x = (xi, . . . ,Xr) is a randomly chosen element of Xi x . . . x Xr then 
the probability that the chain 'H{x) is (e, J, k)-quasirandom is at least 1 — e. Moreover, 
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Q{A) = V{A) when \A\ = k, and for general A the number of sets uia in the partition 
Q{A) depends only on e, J , k and the numbers nc- 

Before we start on the proof, let us comment on how we shall actually use Theorem 
7.3. We will be presented with an r-partite /c-uniform hypergraph H with vertex sets 
Xi, . . . ,Xj.. All the (^) /c-partite parts H{A) of H will have density at least a certain 
fixed d > 0. We will then apply Theorem 7.3 to the partitions V{A) defined as follows. 
If |A| = k then V{A) wiU be {H{A), K{A) \ H{A)}. If \A\ < k then it will be the trivial 
partition {K{A)}. In this case, the result will tell us that we can find partitions Q{A) such 
that almost all edges of H lie in quasirandom chains from the decomposition determined 
by the partitions Q(^). 

§8. Basic facts about partitions and mean-square density. 

In order to prove a regularity lemma for systems of partitions, we need to generalize 
the notion of mean-square density as follows. Let V = {Xi, . . . , X^.} and Q = {Yi, . . . , Yg} 
be two partitions of a finite set U. Then the mean-square density ofV with respect to Q 
is the quantity 



^^1 


f\x^nYJ\\ 


\u\ ' 





i=l j=l 

that is, the sum of all the mean-square densities of the sets X^ (by which we mean the 
mean-square densities of their characteristic functions, as defined in §6) with respect to Q. 

Since the numbers |Xj fl l^|/|Yj| are non- negative and sum to 1, we have the simple 
upper bound 

i=i j=i I I \ I / j=i I I 

for this quantity. An alternative way of seeing this, which will be helpful later, is to notice 
that each u E U is contained in a unique Xi and a unique Yj , and the mean-square density 
of V with respect to Q is the expected value of |Xj n y^ |/|y^ |. 

Lemma 8.1. Let V = {Xi, . . . , X^} and Q — {Fi, . . . , Yg} be two partitions of a finite 
set U, and let Q' be a reRnement of Q. Then the mean-square density of V with respect 
to Q' is at least as great as the mean-square density of V with respect to Q. 

Proof. Let the sets that make up Q' be called l^-fc, where Yj = U/s^fc- each, j 
and k define 7-,- and 7-,^ by \Yj\ = jj\U\ and \Yjk\ = 'yjk\U\. For each i, j and k let 
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dij = \XinYj\/\Yj\ and let dijk = \Xi nYjk\/\Yjk\. Then 

dijk\Yjk I = l-^i n I = |Xi n Yj- 1 = dij\Yj\ , 

k k 

from which it follows that Ylk Ijkdijk = Ijdij for every i and j. 

The mean-square density of V with respect to Q is J^- Ijdfj-, which is therefore 
equal to 

Yl {Yl ^ji^d^jk) =YY{Y ij^''^iikdijk) 

i j k i j k 

i j k k 

by the Cauchy-Schwarz inequality. Since ij^ljk = 1 for every j, this equals 
Si Tlij Tlik ^jkdjji., which is the mean-square density of V with respect to Q'. □ 

The next lemma is a simple, but somewhat irritating, technicality. 

Lemma 8.2. Let e > 0, let Xi, . . . ,Xr be a sequence of Gnite sets, let /C(Xi, . . . , X^) be 
the complete r-partite k-chain with vertex sets Xi, . . . , and for each ^ C {1, 2, . . . , r} 
of size at most k let V{A) be a partition of K{A) into ua sets. For each x = (xi, . . . , Xr) G 
XiX . . .xXr and each A of size at most k let 5a,x be the relative density of the hypergraph 
H{A, x) in the chain H{x) (defined in the previous section). Then if {xi, . . . , Xr) is chosen 
randomly from XiX . . .x Xr and ^c{l,2,...,r} has size at most k, the probability that 
5a,x < ^'^~A most e. 

Proof. Let B and B' be two sets of index A. Let us call them weakly equivalent, and write 
B B' , if B is C-cquivalcnt to B' for every proper subset C of A. Then B is strongly 
equivalent to B' if and only if B ~* B' and B B' . 

The relative density 5a,x is simply the probability that a set B of index A is strongly 
equivalent to x{A) given that it is weakly equivalent to x{A). Since K{A) is partitioned 
into riA sets, the number of strong equivalence classes in each weak equivalence class is at 
most riA- Therefore, for any weak equivalence class T, the probability that x{A) lies in 
a strong equivalence class of size less than en^^|T| given that it lies in T is at most e. If 
x{A) lies in a strong equivalence class of size at least en^"'^|T|, then the probability that B 
is in the same strong equivalence class given that S is in T is at least en^"*^, which implies 
that 5a,x ^ en~^ . 
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Therefore, for every T the conditional probabihty that 5a,x < ^'^a given that x{^A) e 
T is less than e. The result follows. □ 



We now have all the ingredients needed to prove our regularity lemma. 

§9. The proof of Theorem 7.3. 

It will be convenient for the proof if for each set A C [r] of size at most A;, the chain 
J contains a copy Da of the double octahedron of dimension \A\. Since the result for J 
follows from the result for any larger chain, we are free to assume that this is the case. 

We shall first describe an inductive procedure for producing better and better systems 
of partitions when the conclusion of Theorem 7.3 does not hold. Then we shall prove that 
the procedure terminates. 

We shall need one piece of notation. Let Xi , . . . , be a sequence of finite sets and 
for each subset C C [r] of size at most k let 'P(C) be a partition of the hypergraph i^(C). 
For each set A C [r] of size at most k we shall write (ta('P) for the mean-square density 
of the partition V{A) with respect to the partition of K{A) into weak equivalence classes 
with respect to the partition system V. (These were defined in the proof of Lemma 8.2 
above.) 

Lemma 9.1. Let J be an r-partite k-chain with vertex sets Ei, . . . , Er and let < 
e ^ l<^l~^- Let Xi, . . . , Xr be a sequence of Gnite sets and for each subset C C [r] 
of size at most k let V{C) be a partition of the hypergraph K{C) into nc sets. For 
each X = (xi, . . . ,Xr), let H{x) be the chain arising from x and the corresponding chain 
decomposition of ICk{Xi, . . . ,Xr). Suppose that when x is chosen randomly from Xi x 
. . . X Xr the probability that TC{x) fails to be (e, J', k)-quasirandom is at least e. Then 
there is a set A of size s ^ k and a system of refinements Q{C) of the partitions V{C) 
with the following properties. 

(i) Q(C) = V{C) and (Tc{Q) > crc{V) except if C d A and \C\ = s- 1. 

(ii) (ta{Q) exceeds cr^('P) by a non-zero amount that depends only on J , e, k and the 
numbers of cells in the partitions V{B) with \B\ ^ s. 

(iii) When C <Z A and \C\ = s — 1, the number of cells in the partition C depends 
only on e, k and the numbers of cells in the partitions V{B) with B C C. 

Proof. For each set C, let tc be the number of cells in the partition V{C) of K{C). Let 
7 be defined by the equation 2^Yli=i (0 = ^- By Lemma 8.2, the probability that there 
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exists a subset C C [r] of size at most k such that 5c,x < 1'^~a most 7 (0 = 

Therefore, with probabihty at least e/2, the chain Hix) fails to be (e, J ^ A;)-quasirandom 
but for each C the relative density bc,x is at least . 

Let r]2, ■ ■ ■ ,r]k and 62, . . . , be the sequences that appear in the definition of quasir- 
andom chains (in subsection 3.7), and note that rjs depends only on e and the densities 
Sb,x with \B\ ^ s. Since Sc,x ^ 7^c^ every C, it follows that rjs is bounded below by 
a function of e and all those Ib for which \B\ ^ s. 

If 7Y(x) fails to be (e, J', A;)-quasirandom, then there must be a minimal s such that it 
fails to be (cs, <J, s)-quasirandom, and for that s there must be a set A of size s such that 
H{A, x) is not ry^-quasirandom relative to 7i(x), while is (e^-i, J', s — l)-quasirandom. 
Since there are at most Y2i=i (I) possibilities for this set A we may deduce from the last 
paragraph but one that there exists a set A of size s ^ k such that, with probability at 
least 7, the chain TC{x) is (cs-i, J -,8 — l)-quasirandom but is not r/g-quasirandom 

relative to 7i(a;) and bc,x ^ I'^'c every C. 

Let us call x irregular if "^^(0;) has these two properties. Given an irregular x, let 
H- {A, x) be the s-partite (s — l)-chain made up of all the hypergraphs H{C, x) with C C A. 
We can now apply Corollary 6.8 to the chain Tl-{A,x) and to the s-uniform hypergraph 
H{A,x). (Thus, the k of Corollary 6.8 is equal to s here.) Since e^-i ^ e ^ |^|~^ ^^nd 
T>A C J, the conditions hold for the corollary to be applicable, with k replaced by s. 
The hypergraphs Hi, . . .,Hjf in the statement of Corollary 6.8 are, in this context, the 
hypergraphs H{A',x), where A' ranges over all subsets of A of size s — 1. 

For each C C A we know that 6c,x ^ T-^c^- Therefore, if r' is a positive integer that is 
at least Hcei^A 7"^^*^' then for each subset A' G A of size s — 1 we can find a partition of 
H{A' , x) into at most 3'^ subsets, in such a way that the mean-square density of H{A, x) 
with respect to the induced partition of ff* (A, x) is at least ?7s /32. (Here, (A, x) 

denotes the hypergraph consisting of all sets Y of index A such that every proper subset 
of Y belongs to H- {A, x).) 

Let 7i(A, x) be the s-partite s-chain H[A,x) \J l-L-{A,x). The number of distinct 
possibilities for 7i(A, x) as x varies is at most nccv4^C'- -^^^ each one such that x is 
irregular (if l-L{A,x) — TC{A,y) and x is irregular then y is irregular) choose a partition of 
the hypergraphs H{A', x) as above. In general, it will often happen that H{A, x) ^ 'H{A, y) 
but H{A\x) = H{A',y), so each hypergraph H{A',x) may be partitioned many times. 
However, the number of distinct chains H{A, x) is at most Ta = Hcca ^c, so we can find 
a common refinement of all the partitions of H{A', x) into at most 3^'^^ sets. 
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For each C ^ of size s — 1 let Q{A') be the union of all these common refinements, 
over all the different sets H{A',x). There are at most Ta' of these sets, each partitioned 
into at most 3'''^^ sets, so Q{A') is a partition of K{A') into at most T^/S'^'^^ sets, and it 
refines the partition V{A'). For all other sets A, let Q{A) = V{A). 

By Lemma 8.1, given any irregular x, the mean-square density of H{A, x) with respect 
to the partition of Hr^{A,x) that is induced by the refined partitions of the hypergraphs 
H{A',x) is still at least + ril/32. As for a regular x, Lemma 8.1 tells us that the 
mean-square density of H{A,x) with respect to the refined partition of 7i{x)^{A) is still 
at least 5\^^. 

Let cr^('P) be the mean-square density of the partition V{A) with respect to the 
partition of K{A) into weak equivalence classes coming from the partitions V{C). Let 
o'AiQ) be the mean-square density of V{A) = Q{A) with respect to the partition of K{A) 
arising from Q in the same way. By the remark preceding Lemma 8.1, cr^('P) is the 
expectation of 6A.y over all sequences y = (yi, . . . , yr)- Let us write this as SA,y{'P) since 
it depends on the system of partitions V{C). Thus, (JAiV) is the expectation of 5A,xiV) 
and similarly for Q. 

What we have just shown is that if x is irregular, then IE[5/i.y(Q) |y G H{A,x)] is 
at least 5A,yiVY + ^7^/32, which equals ¥.[5A,y{V)\y G H{A, x)] + 77^/32. If x is regular, 
then this conditional expectation is at least dA,y{'P)'^, or K[dA,yi'P)\y ^ H{A,x)]. Since 
the probability that x is irregular is at least 7, this shows that E[5yi^y(Q)] ^ E[SA,y{'P)] + 
777^/32. In other words, (ta(Q) > aA{V) + -fr]l/32. 

To summarize: if the conclusion of Theorem 7.3 is not true for the partitions V{C) 
then there is a set A of size s ^ k and a system of refinements Q{C) such that Q{C) — V{C) 
except when C is a subset of A of size s — 1, and such that (7^(2) ^ <^AiJ-^) + 7^s/32. For 
a general C, we have ac{Q) ^ crcij^) except if C C ^ and \C\ = s — 1. This is because if 
C is any other set, then Q{C) = V{C) and all other partitions have either been refined or 
stayed the same. Thus, the lemma is proved. □ 

To complete the proof of Theorem 7.3, we must argue that this process of successive 
refinement cannot be iterated for ever. 

Imagine, then, that we are trying to find an infinite sequence of refinements of the 
kind we are given by Lemma 9.1. The difficulty we face is that the mean-square densities 
o'ciV) tend to increase, and there is always one set A for which (ta{V) increases fairly 
substantially. Our only hope is that for the subsets C of ^ obtained by removing one 
element, the mean-square densities can drop considerably. 
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The trouble with that, however, is that the only way of getting the mean-square 
density ac{V) to drop is by getting the mean-square density of some larger set (ta{V) to 
increase. 

To see why this observation leads to a proof, suppose that we do indeed have an 
infinite sequence of refinements of the kind given to us by Lemma 9.1. Then there must be 
a set A of maximal cardinality s that is used infinitely many times. It follows that there 
must be some point in the sequence after which A is used infinitely many times but no set 
of larger cardinality is ever used. After that point, the only partitions V{C) that change 
are for sets B of cardinality less than s, by (i) of Lemma 9.1. It follows from (ii) that 
after that point the quantity (Ta{Q) increases infinitely often by an amount that does not 
change as the iteration proceeds. This is a contradiction, since (Ta{Q) is bounded above 
by 1. The proof of the regularity lemma is complete. 

A careful examination of the above argument shows that the bound that arises from 
it increases by one level in the Ackermann hierarchy each time k increases by 1, except at 
the jump from the trivial case = 1 to the first non-trivial case k = 2, when we go from 
nothing to a bound of tower type. In particular, since we shall need /c-uniform hypergraphs 
to prove the multidimensional Szemeredi theorem for sets of size k + 1, our bound for that 
theorem is of Ackermann type. The only cases where better bounds are known are the 
one-dimensional case, which is treated in [Gl], and the case of sets of size 3, where a trebly 
exponential bound was obtained by Shkredov [S]. 

§10. Hypergraphs with few simplices. 

Now that we have established counting and regularity lemmas we have the tools nec- 
essary to prove the generalization of Theorems 1.3 and 1.6 to A;- uniform hypergraphs. 

Theorem 10.1. Let k be a positive integer. Then for every a > there exists c > with 
the following property. Let H be a {k + 1) -partite k-uniform hypergraph with vertex sets 
Xi, . . . , Xk+i, and let Ni be the size of Xj. Suppose that H contains at most c Yl'i-i 
simplices. Then for each i ^ k + 1 one can remove at most a Ylj^i edges of H from 
Ylj^i -^j such a way that after the removals one is left with a hypergraph that is simplex- 
free. 

Proof. For each subset ^ C [A; -|- 1] of size at most k, define a partition V{A) of K{A) as 
follows. If 1^1 < A; then V{A) consists of the single set K{A). If |^| = A; then it consists 
of the sets H{A) and K{A) \ H{A). Now apply Theorem 7.3 to this system of partitions. 
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with J = [k + l]^^'^) and e = min{\J\~^ /2,a/2}, obtaining for each A e J a. partition 
Q{A) of K{A) into tua sets. 

If a; = {xi, . . . , Xk+i) e Xi X . . . X X^+i and H{x) is not (e, J^, A;)-quasirandom, then 
there must be some A of size s ^ k such that H{A, x) is not r/g-quasirandom relative to 
H{x). There must be some i such that i ^ A, and if (j/i, . . . , yk+i) is another sequence 
such that i/j = Xj when j ^ i, then H{A,y) will also not be ?7s-quasirandom relative to 
H{y). Therefore, since H{x) is (e, J, A;)-quasirandom with probability at least 1 — e, there 
are at most e Hj^i elements of Ylj^i that can be extended to sequences x such that 
T-C{x) is not (e, ^7, fc)-quasirandom. Remove from H any such element. 

Let 7 be defined by 7 ^^^^ {^^^) ~ ^f^- Lemma 8.2 tells us that if a; = (xi, . . . , Xk+i) 
is chosen randomly, then with probability at least 1 — a/2, we have 5a.x ^ 7"^^^ for every 
A & [k + 1]^^^). Again, the event that this happens for a particular A docs not depend on 
the Xi with i ^ A. So for each i there are at most ftllj-^j elements of Ylj-^^Xj that 
can be extended to sequences x for which 5a,x < 7™^^ f*-*^ some A C [A; + 1] with i ^ A. 
Once again, remove all such elements from H. 

For each i we have removed at most a Ylj^i elements from HnYlj^i -^r remains 
to show that in the process we have either removed all simplices from H, or else, for some 
c > that depends on a only, there were at least cf|^- Nj simplices to start with. 

Suppose, then, that after the removals there is still a simplex x = (xi, . . . , Xk+i), and 
consider the chain H{x). Then for every ^4 C [A; + 1] of size k the following statements are 
true. First, the set x{A) is an element of H (or else x would not be a simplex). Second, 
the hypergraph H{A, x) is a subset of H (since x{A) e H and the partition into strong 
equivalence classes resulting from Q refines the partition V). Third, 5c,x ^ l^^c^ for 
every C C ^ (or else we would have removed x{A) from H). Finally, the chain 'H{x) is 
(e, J", A;)-quasirandom (or else for some A of size k we would have removed x{A) from H). 

We now apply Corollary 5.2, the counting lemma for quasirandom chains. It im- 
plies that the number of simplices in the chain Ti.{x), which is the same as the num- 
ber of homomorphisms from J to H{x), is at least Ylj ^jUaeJ ^^^x'' which is at least 
Ylj ^3 WaeJ 7"^a^- -^^t 7 ™^ depend on a and k only, so the result is proved. □ 

Finally, let us deduce from this a multidimensional Szemeredi theorem. 

Theorem 10.2. Let 5 > and A; e N. Then, if N is sufRciently large, every subset A of 
the k-dimensional grid {1,2,..., A'"}'^ of size at least 5N'' contains a set of points of the 
form {a} U {a + dci : 1 ^ i ^ k}, where ei, . . . , is the standard basis of R'^ and d is a 
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non-zero integer. 

Proof. Suppose that A is a subset of {1,2, ...,iV}'= of size ^A^'^, and that A contains no 
configuration of the kind claimed. Define a, [k + l)-partite fc-graph Fk with vertex sets 
Xi, . . . ,Xk+i as follows. If j ^ k then the elements of Xj are hyperplanes of the form 
Pj,m = {(a^i, • • • , Xk) : Xj = m} for some integer m G {1,2,..., A^}. If j = A; + 1 then they 
are hyperplanes of the form Qm = {{xi, . . . , Xk) : Xi + . . . + Xk = m} where m is an integer 
between k and kN. The edges of are sets of k hyperplanes from different sets Xj that 
intersect in a point of A. 

If Fk contains a simplex with vertices Pj^mj and Qm, then the points (mi, . . . ,mfe) 
and (mi, . . . , m^) + (m — Y^^^i mi)ej all belong to A. This gives us a configuration of the 
desired kind except in the degenerate case where m = "^ii which is the case where all 

k-\-l hyperplanes have a common intersection. By our assumption on A, all the simplices 
in Fk are therefore degenerate ones of this kind, which implies that there are at most 5N^ 
of them. 

Now \Xi\ = N Hi and |Xfc+i| = kN. We can therefore apply (the contrapositive 
of) Theorem 10.1 with c = N~^k~^. If N is suflBciently large, then the resulting a is smaller 
than 5 /2k., which implies that we can remove fewer than 5N^ edges from the hypergraph F^, 
and thereby remove all simplices. However, every edge of a degenerate simplex determines 
the point of intersection of the k + 1 hyperplanes and hence the simplex itself. It follows 
that one must remove at least 5N^ edges to get rid of all simplices. This contradiction 
proves the theorem. □ 

The above result is a special case of the multidimensional Szemeredi theorem, but it 
is in fact equivalent to the whole theorem. This is a well-known observation. We give a 
(slightly sketchy) proof below. 

Theorem 10.3. For every 5 > 0, every positive integer r and every Gnite subset X <zH 
there is a positive integer N such that every subset A of the grid {1,2,..., A^}*^ of size at 
least SN"^ has a subset of the form a + dX for some positive integer d. 

Proof. It is clearly enough to prove the result for sets X such that X = —X, so all we 
actually need to ensure is that d ^ 0. A simple averaging argument shows that we may 
also assume that X is not contained in any (r — 1) -dimensional subspace of W. Let the 
cardinality of X be A; + 1. Let be an affine map that defines a bijection from the set 
(0, ei, . . . , Cfc} C M'^ to X, regarded as a subset of W. Another simple averaging argument 
allows us to find a grid (1,2, . . . ,M}'^, where M tends to infinity with AT, as well as a 
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point z ^ 17' and a constant r/ > depending on 5 and X only, such that z + (j){x) e A 
for at least r]M^ points in {1,2,..., M}^. Let B be the set of points with this property. 
Thus, B has density at least r] and Theorem 10.2 shows that B contains a set of the form 
w + c{0, ei, . . . , e/c}. But then z + (j){w + c{0, ei, . . . , Cfc}) is a set of the form a + dX and 
is also a subset of ^. □ 

Concluding Remarks. 

This paper has a slightly strange history, which may be worth briefly outlining here. 
The main results were first obtained in 2003, and a preprint circulated. I am very grateful 
indeed to Yoshiyasu Ishigami, who read this preprint carefully and found an error which, 
though it did not invalidate the approach, occurred early in the argument and therefore 
necessitated changes throughout the paper. While thinking about how to go about this 
rewriting, I discovered a much simpler proof of the counting lemma, and in the end it 
seemed best, even if depressing, to rewrite the whole paper (including the regularity part) 
from scratch. 

I owe a second debt of gratitude to the two referees, who also read the paper with 
great care. Not only did they save me from a large number of minor errors, but they also 
made valuable suggestions about the presentation of the paper. While thinking about how 
to respond to these suggestions I realized, with a certain sense of deja vu, that the sections 
on the counting lemma could still be greatly improved. The argument that now appears is 
essentially the same, but the notation has been changed and the triple induction slightly 
reorganized, with the result that the proof is now shorter, clearer, and easier to identify 
with the arguments presented in the special cases in §2. That section, as was mentioned 
in the footnote at the beginning of it, was not in the original version of the paper. The 
excellent idea of presenting some small examples was suggested by one of the referees. 

In 2005, Tao [T] gave another proof of the main result of this paper (Theorem 10.1), 
and indeed of a slight generalization. He too proved regularity and counting lemmas. His 
methods were more closely related to those of Nagle, Rodl, Schacht and Skokan, but he 
introduced some new ideas and a different language that led to considerably shorter proofs 
than theirs. 
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